Morpheus

The Care and Feeding of Morpheus

Morpheus is the morphological analysis engine that underlies all the Perseus language tools. It is described in Crane 1991.¹ Support for Greek is detailed and robust. Support for Latin is reasonably complete, though some archaic or late forms may be missing, and the vocabulary is not as full as in Greek. Support for Italian is rudimentary.

The component parts of Morpheus are:

the endings database
the stems files
the compilation utilities
the analysis utility

All of the code is in C, and compiles and runs with gcc on every platform where it has been tested. There are also post-processing routines of various kinds which integrate Morpheus into the Perseus system; these properly belong to the text system rather than to Morpheus itself, so will only be mentioned briefly here.

Code and data for Morpheus are stored in a single directory tree, conventionally ..../morph. The tree can be anywhere, though it is often placed in the sgml tree as a sibling of ..../sgml/texts and ..../sgml/xml. This tree has the following structure:

bin (executables)

src (several subdirectories; see below)

morph Greek

stemlib Latin

Italian

In what follows, these components will be described in order of increasing complexity.

Analysis: the end-user view

The primary user interface to Morpheus itself is cruncher.2 In simplest terms, it reads words from standard input and writes their possible morphological analyses to standard output. Here is an example; user input is in green and program replies in brown.

$ cruncher
ai)/louros
<NL>N ai)/louros  masc/fem nom sg                       os_ou</NL>
ai)/louroz
ai)/louroz
^D
$

The default language is Greek, and Greek must be entered in beta-code.³ For Italian, use the beta-code convention for accents, writing à as a\.

If you enter a word that Morpheus does not recognize, it will simply echo it back to you. This can happen when the word is mis-spelled or is not correct Greek, as in the example; it can also happen with legitimate words or forms that are not known to Morpheus. (This will be very rare in Greek, will happen occasionally in classical Latin, and will be fairly common in Italian.)

The following are the commonly used command-line switches.

Switch Use

-L sets language to Latin

-I sets language to Italian

-S turn off Strict case. For Greek, allows words with an initial capital to be recognized, so that for example the personification *tu/xhs at Soph. OT 1080 is recognized as the genitive singular of tu/xh. For languages in the Roman alphabet, allows words with initial capital or in all capitals to be recognized.

-n ignore accents. Allows words with no accents or breathings, or with incorrect ones, to be recognized.

Switch	Use
-L	sets language to Latin
-I	sets language to Italian
-S	turn off Strict case. For Greek, allows words with an initial capital to be recognized, so that for example the personification `*tu/xhs` at Soph. OT 1080 is recognized as the genitive singular of `tu/xh`. For languages in the Roman alphabet, allows words with initial capital or in all capitals to be recognized.
-n	ignore accents. Allows words with no accents or breathings, or with incorrect ones, to be recognized.

The following other switches are supported.

Switch Use

-d database format. This switch changes the output from "Perseus format" to "database format." Output appears in a series of tagged fields.

-e ending index. Instead of showing the analysis in readable form, this switch gives the indices of the tense, mood, case, number, and so on (as appropriate) in the internal tables.

-k keep beta-code. When "Perseus format" is enabled (the default), this switch does nothing. When "Perseus format" is off, Greek output is normally converted to the old Greek Keys encoding. This switch disables that conversion so that Greek output stays in beta-code. Note that the handling of this switch was not updated when Latin was implemented, so when "Perseus format" is disabled, Latin and Italian will also be converted to this Greek font encoding. Hence if you are disabling Perseus format in those languages, you should also set the -k switch.

-l show lemma. When this switch is set, instead of printing the entire analysis, cruncher will only show the lemma or headword from which the given form is made.

-P turn off Perseus format. Output will be in the form
$feminam& is^M &from$ femina^M $fe_minam^M [&stem $fe_min-& ]^M & a_ae fem acc sg^M
Note the returns, without line feeds, between the fields.

-V analyze Verbs only. When this switch is set, words that are not verbs will not be recognized, and words that could be analyzed as either verb forms or noun forms will be treated as certainly verbs

Switch	Use
-d	database format. This switch changes the output from "Perseus format" to "database format." Output appears in a series of tagged fields.
-e	ending index. Instead of showing the analysis in readable form, this switch gives the indices of the tense, mood, case, number, and so on (as appropriate) in the internal tables.
-k	keep beta-code. When "Perseus format" is enabled (the default), this switch does nothing. When "Perseus format" is off, Greek output is normally converted to the old Greek Keys encoding. This switch disables that conversion so that Greek output stays in beta-code. Note that the handling of this switch was not updated when Latin was implemented, so when "Perseus format" is disabled, Latin and Italian will also be converted to this Greek font encoding. Hence if you are disabling Perseus format in those languages, you should also set the -k switch.
-l	show lemma. When this switch is set, instead of printing the entire analysis, cruncher will only show the lemma or headword from which the given form is made.
-P	turn off Perseus format. Output will be in the form `$feminam& is^M &from$ femina^M $fe_minam^M [&stem $fe_min-& ]^M & a_ae fem acc sg^M` Note the returns, without line feeds, between the fields.
-V	analyze Verbs only. When this switch is set, words that are not verbs will not be recognized, and words that could be analyzed as either verb forms or noun forms will be treated as certainly verbs

The following switches, which appear in the main routine, do nothing.

Switch Use

-a sets the SHOW_ANAL flag, which is never checked

-b sets the BUFFER_ANALS flag, which is no longer checked

-c sets the CHECK_PREVERB flag, which is no longer checked

-i sets the SHOW_FULL_INFO flag, which is never checked

-m sets the SHOW_MISSES flag, which is never checked

-p sets the PARSE_FORMAT flag, which is unconditionally turned on later anyway

-s sets the DBASESHORT flag, which is checked only in a routine that is never called

-x sets the LEXICON_OUTPUT flag, which is checked only in a routine that is never called

Switch	Use
-a	sets the SHOW_ANAL flag, which is never checked
-b	sets the BUFFER_ANALS flag, which is no longer checked
-c	sets the CHECK_PREVERB flag, which is no longer checked
-i	sets the SHOW_FULL_INFO flag, which is never checked
-m	sets the SHOW_MISSES flag, which is never checked
-p	sets the PARSE_FORMAT flag, which is unconditionally turned on later anyway
-s	sets the DBASESHORT flag, which is checked only in a routine that is never called
-x	sets the LEXICON_OUTPUT flag, which is checked only in a routine that is never called

Adding stems

Morpheus recognizes inflected words by comparing the given forms to known stems and endings. Stems are defined to belong to particular inflectional classes, for example first-declension nouns or second-conjugation verbs. Making a new word available to Morpheus involves adding it to the appropriate stems files.

Stems files are in the stemsrc directory under the appropriate language in the Morpheus tree. For example, stems for Latin are in ..../morph/stemlib/Latin/stemsrc. Stems for verbs and nouns are filed separately, because they are compiled by different routines. Indeclinable words, by convention, go into the nouns files. Adjectives are not distinguished from nouns.

The existing stem files for each supported language include one each for irregular nouns and verbs, one each for nouns and verbs extracted from the major dictionary, and one or more additional files for words that are not in the dictionary. These additional files are typically used for words appearing in texts outside the classical period (for example in Byzantine Greek or Neo-Latin) or for proper names. Most such words are nouns, but there is no reason there could not be additional verb files as well. It is convenient for maintenance to use a separate stem file for each new group of unusual words. For example, in Latin, nom.01 contains common quasi-regular words, nom.02 mostly contains words from Plautus, plus the larger numbers, nom.03 mostly contains words from Glass's biography of George Washington, and nom.04 contains words from the Vulgate.

The format of a stem file entry is like this:

:le:lemma
:xx:stem class other

Lines in the file that do not begin with a keyword enclosed in colons are ignored. Each line begins with a keyword identifying the type of word. The first line must have the :le: keyword, for the lemma or headword. The next line has a "part of speech" keyword. There may be more than one "part of speech" line for a given lemma. In each "part of speech" line, the first field is the stem. It must be followed by a tab. The rest of the line contains codes for inflectional class and gender, separated by spaces.

The lemma is given in its ordinary form. Vowel quantities are marked only in the stem field, not the lemma. Long vowels are marked by a following underscore, short vowels by a following up-arrow. It is not necessary to mark the quantities of unambiguous Greek letters (eta, epsilon, omega, omicron), vowels whose quantity is clear from the accent, or vowels in closed syllables; vowels otherwise not marked are considered short. In Greek, the stem field has no accent, though it must have a breathing if the word begins with a vowel.

Here are some examples.

Latin nouns:

:le:femina
:no:fe_mi^n     a_ae fem

:le:amor
:no:am or_oLris masc

:le:Americanus
:aj:America_n   us_a_um

Latin verbs:

:le:quiesco
:vs:quiesc      conj3
:vs:quie_v      perfstem
:vs:quie_t      pp4

:le:creo
:de:cre are_vb

Greek nouns:

:le:ai)/louros
:no:ai)elour os_ou masc fem
:no:ai)lour os_ou masc fem

:le:deino/s
:aj:dein os_h_on suff_acc

Greek verbs:

:le:nomi/zw
:de:nom izw

:le:gra/fw
:vs:gra^f aor2_pass
@ fut

The following are the keywords recognized in stems files.

keyword indicates

:le: lemma or headword

:wd: indeclinable form (preposition, adverb, interjection, etc.) or unanalyzed irregular form

:aj: adjective; must have an inflectional class

:no: noun; must have an inflectional class and a gender

:vb: verb form; for unanalyzed irregular forms

:de: derivable verb; must have an inflectional class

:vs: verb stem, one of the principal parts; must have an inflectional class

keyword	indicates
:le:	lemma or headword
:wd:	indeclinable form (preposition, adverb, interjection, etc.) or unanalyzed irregular form
:aj:	adjective; must have an inflectional class
:no:	noun; must have an inflectional class and a gender
:vb:	verb form; for unanalyzed irregular forms
:de:	derivable verb; must have an inflectional class
:vs:	verb stem, one of the principal parts; must have an inflectional class

The inflectional class codes are different for each language. They are the base names of the files in ..../morph/stemlib/language/endtables/source. In general the easiest way to determine the correct class codes is to look at a similar word -- another noun of the same declension, for example. Gender codes are masc, fem, neut, masc/fem, masc/neut, the latter two used when endings for the two genders are the same. Use "masc fem" for a noun that can be of either gender. Other codes, for number, person, tense, mood, voice, or case, usually only appear in the stems files for irregular forms; these codes are listed under "Adding and changing endings.".

In general the class code for a noun declension will look like the nominative and genitive, for example a_ae for the Latin first declension. For an adjective, it will look like the three nominative forms, for example os_h_on for Greek first-and-second declension adjectives. Verbs are a bit more complex since the several stems usually need to be specified separately, except for highly predictable groups like the Latin first conjugation.

Most of the new words that will need to be added are regular, because virtually all of the irregular words are already in the stems files (even for Italian), since they are the most common words in the language.

Once you have added your words, you need to compile the database. In the next directory up from the stems files, that is ..../morph/stemlib/language/stemlib, you will find a make file; simply make all. Note several assumptions in these make files:

All nouns files have the string nom in their names.

All verbs files have the string vbs in their names.

All files of the form nom.* or nom[0-9]* are nouns files to be compiled.

All programs are in the path.

The compilation utilities, like cruncher itself, rely on the MORPHLIB environment variable. This must be set to ..../morph/stemlib, wherever that is on your system. All of the code is in ..../morph/bin, which must be on the path.

The compilation will produce various messages, most of which can be ignored. True errors will be reported by make in the usual way. Here are examples of the most common messages:

From buildend:
MorphFopen: could not open [/data/sgml/morph/stemlib/Latin/endtables/source/or_uris.end]
This indicates that there is a reference to inflectional class or_uris somewhere in the definitions of endings, but no actual definition for its endings. The stray reference may be in an ordinary endings files (in directory ..../morph/stemlib/language/endings/source), a basic endings file (..../morph/stemlib/language/endings/basics), a derivation file (..../morph/stemlib/language/derivs/source), or a rules file (..../morph/stemlib/language/rule_files). If you intend to use this inflectional class, you will need to create its endings file.
If you see this message, you will also see
could not open [or_uris.end] or [endtables/source/or_uris.end] and, from indendtables,
MorphFopen: could not open [/data/sgml/morph/stemlib/Latin/endtables/out/or_uris.out]
From buildend:
endtables/ascii/a_ae.asc
This is a progress message.
From indendtables:
stype 14000 stype [14000] output file:endtables/indices/nendind
This is a success message indicating the output file the program has created.
From indexnoms or indexvbs:
1000) [2quamquam :quisquam:indef:fem:acc:sg]
This is a progress message.
From indexnoms or indexvbs:
out of qsort done with i=46975, 0 about to index [steminds/nomind] have just indexed [steminds/nomind] bufsiz 5631748 bytes allocated 5631748 bytes successfully! stemcount 46975 This is a progress message.
From indexnoms:
processing 5000: Bacchylid :Bacchylides:es_is:masc
This is a progress message.
From do_conj:
rval 0 stembuf [br] global [] deriv [o_stem] tk [vn,-mm,h_hs]
This indicates that no verb conjugation information could be deduced for the partial stem br.
From buildderiv:
compiling deriv [ire_vb] derivs/ascii/ire_vb.asc
This is a progress message.
From buildderiv:
[reg_conj] not a regular conj [1000003] [2000000]
This indicates that the given verb derivation rule (in ..../morph/stemlib/language/rule_files/derivtypes.table) is not flagged as a regular derivation.
From buildderiv:
output file:derivs/indices/derivind
This is a successful completion message.

In general if you mis-type inflectional class information in a stems file, you will not get a message from the compilation process. You should therefore check your new words once your compilation has finished. Do this by running cruncher and entering several forms of the new words. If they are not recognized, then you have mis-typed something in the stems file.

Adding and changing endings

Although the main morphological classes for the supported languages are all defined, it is occasionally necessary to correct a problem, or to add a dialect form. Endings are defined in the ..../morph/stemlib/language/endtables directory and its subdirectories. Two subdirectories, basics and source, contain files that can be edited; the others, ascii, indices, and out, contain the compiled representations of the input files.

The files in ..../endtables/source define the inflectional classes. The names of these files are the inflectional class codes that appear in the stems files. For example, the endings for Latin fifth-declension nouns are defined in es_ei.end and those nouns are listed in the stems files like this:

:le:facies
:no:fa^ci^      es_ei fem

Here is the content of es_ei.end:

e_s	masc fem nom sg
ei_ 	gen sg
ei_	dat sg
em	masc fem acc sg
e_	abl sg

e_s	masc fem nom pl
e_rum	gen pl
e_bus	dat pl
e_s	masc fem acc pl
e_bus	abl pl

e_	dat sg early poetic
e_	gen sg early poetic

In this file, blank lines are ignored. Non-blank lines have two fields, separated by a tab. The first field is the ending and the second tells where it is used. For example, the first line of the file says that e_s (that is, -es with a long e) is the ending for masculine and feminine nominative singular. The gender could in fact have been omitted, as it is for other cases, since all fifth-declension nouns have the same endings regardless of gender. (Moreover, every noun in this declension is feminine except dies and its compounds.) Long vowels are marked as in the stems files, with a following underline. Short vowels are not marked.

The codes for genders are as in the stems files, masc, fem, neut, masc/fem, masc/neut. Number codes are sg, dual, pl. Cases are nom, gen, dat, acc, abl, voc. For verbs, persons are 1st, 2nd, 3rd, numbers are as for nouns, and voices are act, mid, pass, mp. Tenses are pres, imperf, fut, aor, perf, plup, futperf. Moods are ind, subj, opt, imperat, inf, part, supine, gerundive (there is no code for the gerund as distinct from the gerundive).

Other modifying codes include early, poetic, attic, doric, and so on. All of these codes are defined in morphkeys.h in the src/morphlib subdirectory (see below).

The endings file for the Latin fifth declension is not typical. More often, an inflectional class is defined by reference to another class. For example, participles use the endings of adjectives, and several different verb tenses and moods use the same groups of endings. To express these relationships, Morpheus defines basic endings and then references them in inflectional class files. For example, the Greek noun class c_ktos (as in anax) is defined like this:

c c_ktos masc fem nom voc sg 
* c_ktos neut nom voc acc sg
kt@decl3 c_ktos

This says that masculine and feminine nouns of this class have their nominatives ending in c, neuters have simply the stem for the nominative, and the remaining cases end in -kt- plus the appropriate third-declension ending.

The @decl3 reference is to a file in the ..../endtables/basics directory. That directory contains groups of endings that can be re-used. The format of the "basics" files is the same as that of the ordinary inflectional class endings files, and their names are also *.end. To use a basic endings group in an inflectional class file, put its name, preceded by an at sign, in the place of the actual endings -- or even parts of endings, as in the example above.

There is a further way to relate different inflectional classes, using the derivs directory. Files in ..../derivs/source pull together information about stem formation and inflectional classes. They are only used for verb classes. For example, Latin fourth-conjugation verbs are defined in ire_vbs.deriv as follows:

*	conj4
*	ivperf
i_	perfstem
i_t	pp4

Here the second field contains references to basics files. This file says that this class of words takes the endings of conj4 and of ivperf, and that the perfect stem is formed by adding long i and the fourth principal part by adding it with long i. Verbs can then be declared in the stems files to be of this class, for example:

:le:munio
:de:mun ire_vb

There is one further complication to endings files. In the rule_files directory are two files that determine whether inflectional class files apply to verbs, nouns, or adjectives. The derivtypes.table file must list every file from the derivs/source directory. The stemtypes.table file must list every file from the endtables/source directory. If you add a new inflectional class, you will also need to declare it here. In each of these tables, the second field is a serial number and the third describes what kind of object is being declared.

Once you have created or modified endings files, you can add or update stems entries to use them; you do not need to compile the database first. But once you're finished with all the modifications, to endings and stems, then you must compile the database, as described above.

An introduction to the code

Source code for Morpheus, written in C (mostly, though not entirely, ANSI C), is in the ..../morph/src directory tree. There is a make file at top level in the src directory which controls compilation of the six libraries and twenty-six main programs that make up Morpheus. Those programs are installed into ..../morph/bin.

The main routine for cruncher is ..../morph/src/anal/stdiomorph.c. The actual work happens in subroutine checkstring and its subsidiaries, all in file ..../morph/src/anal/checkstring.c. Most of the significant modifications and bug fixes over the past three years have been in this file as well.

The executables used in compiling the database (see above) are

in ..../morph/src/gener, do_conj built from conjmain.c

in ..../morph/src/gkdict, indexnoms built from indexnoms.main.c and indexvbs built from indexvbs.main.c

in ..../morph/src/gkends, buildend built from expendmain.c, buildderiv built from expsuffmain.c, buildword built from expwordmain.c, indderivtables built from smain.c, indendtables built from imain.c

There are other executable routines (see the makefiles in the various directories), but they are not currently used.

Most header files are in ..../morph/src/includes, though some are in the code directories. Directories ..../morph/src/greeklib and ..../morph/src/morphlib contain utility routines which get linked into object libraries. Each of the code directories anal, gener, gkdict, and gkends also has an object library for its subroutines. Executables are statically linked against all these object libraries.

Other directories in the source tree contain related code which is not actually part of Morpheus. Directory auto contains code for character encoding conversions. Directory retr has a search engine for the TLG CD. Directory scan has initial experiments toward scansion. Directory tlg has a one-file TLG search engine; a comment at the head of the file calls it "unbelievably ugly and impossible to figure out." Finally, directory play is a space for toy routines.

The main loop of cruncher is quite simple: it reads a string from stdin, drops white space, and passes the trimmed string to checkstring. It then displays the result on the output file, typically stdout. This continues until end of file on input.

The real work is driven by checkstring. This routine comes in five layers: checkstring calls checkstring1, which calls checkstring2, which calls checkstring3, which calls checkstring4. In each case, if the next lower layer does not recognize the word, we adjust -- for crasis, enclisis, dialect forms, or the like -- and try again. The innermost layer, checkstring4, calls checkword (in checkword.c), which ultimately calls the routines in ..../morph/src/gkdict/dictio.c to look up the word in the actual tables. In the case of a simple word, such as ego (in either language), this is all we need to do. For inflected words, checknom and checkverb peel off letters one at a time from the beginning of the word until they recognize an ending. If the peeled-away part is recognizable as a stem (or a stem with a prefix), then this is a possible analysis.

If checkword does not find any analyses, then checkstring4 looks for spelling variations: cun- for sun- or -ss- for -tt- in Greek. If checkstring4 does not find any analyses, then checkstring3 looks at capitalization, elision or prodelision, attached enclitics (Greek -per, Italian pronouns, Latin -que, -ve, -ne), and alternation between i and j or u and v. If checkstring3 does not find any analyses, checkstring2 tries various Greek dialects. If checkstring2 does not find any analyses, checkstring1 looks at initial prodelision in Greek. And if checkstring1 does not find any analyses, checkstring assumes the word is simply not recognized.

The main data structure behind all this is the gk_word structure, accessed throughout checkstring by the pointer Gkword. Structure gk_word is defined in ..../morph/src/includes/gkstring.h. It includes character buffers for the original word and the working form of the word (as adjusted for spelling, dialect, and so on). It also includes flags for various options, including those that can be set on the cruncher command line.

The gk_word structure is not manipulated directly but via routines like set_workword and set_prntflags. That is, although the code is written in C and long pre-dates C++, it uses data hiding principles similar to those of object-oriented languages. Maintainers are strongly urged to respect this design.

Finally, it should be noted that the earliest stages of development took place without the use of a version management system. Later, when Perseus adopted such a system, it took a while for everyone to become comfortable with its use. As a result, there are many commented-out sections of code (rarely necessary when it is easy to inspect older versions), and many un-informative log messages ("Nightly backup" is common), making it hard to recover the early history of the code. But the last three years' work should be reasonably well accounted for.

What you can do with the results

Most Perseus users see Morpheus only in the context of the Word Study Tool links on Greek, Latin, and Italian words. On text pages, these links are created by routine morph_links (in ..../cgi-bin/IncPerl/FilterText.pm); on Lookup Tool pages, they come from the independent program ..../cgi-bin/Support/umorphck. Each of these routines works on the generated HTML page just before it is returned to the client browser. Greek is identified by <G> elements, Latin by <L>, and Italian by <IT>. These elements are inserted by the transformation routines whenever an element has a suitable lang attribute. The morphology linking routines must tokenize the stream into words, skipping over embedded HTML elements, then insert links to morphindex. So that users are not annoyed by links that do not produce results, morph_links and umorphck use a cache of known forms to determine which words should be linked. Mis-spelled words and words that are not known to Morpheus will not receive Word Study links.

The cache is created by compilation routines in the XML build directory ..../sgml/xml and stored as ..../cgi-bin/DBs/language/mdb.4 The langinstall target in the XML Makefile handles building this cache for each language and copying it to the correct place. At each build, we make ..../sgml/xml/morph/language.words, a list of all the words of the given language in the system that have not been seen before. At the same time, we merge the previous language.words file with language.words.old, thus updating the list of previously-seen words. We then run cruncher over any words in the new language.words file. Those for which cruncher finds analyses are added to the cache; those for which it does not are listed in ..../sgml/xml/morph/language.failed. It is convenient to delete all of the files in ..../sgml/xml/morph/, or all the files for one language, whenever you have made significant changes to the Morpheus stems or endings tables, because those changes will not be reflected in the cache otherwise. That is, if you have added a stem, but its forms are already in the already-seen-words list, those forms will not be re-analyzed and will therefore not have links in the on-line system. To force re-analysis, then, delete the files in ..../sgml/xml/morph/.

Other compiled files include the wmdb database, the freqs database, and the inflex database. The wmdb database (..../cgi-bin/DBs/language/wmdb) gives, for each analyzed form, the list of headwords it might come from, with weights. For example, Latin facies could be either a form of the noun facies or a form of the verb facio, so it is given a weight of 1/2 for each of those headwords. (That it could, in fact, be any of three forms of the noun is not relevant here, only that it could come from either of two words.) This database is created from the output of cruncher by program ..../sgml/xml/weightmorph. It is used by the Lookup Tool, to relate user input to headwords for lemmatized searching, and, during compilation, by lemsens, to create the lemmatized sentence files.

The freqs database contains the number of occurrences of each word in each corpus. Keys look like facies#perseus,author,Plautus and values look like 50 23 36.5. That is, the key is a headword followed by the name of a corpus (whose official name might be Perseus:corpus:perseus,author,Plautus; see corpora.xml), and the value is the maximum, minimum, and weighted occurrences of this word in this corpus. This database is also stored in the language database directory. It is used by the Word Study Tool, the lexicon display routines, and the frequency tool. During compilation, it is used by catalog and collect_coll, the routines that underlie the Vocabulary Tool.

The inflex database gives, for each headword, a list of all the inflected forms of that word attested in the texts. It is currently used only by the frequency tool, to verify that the word of interest is actually a headword; it used to be used by psearch, the full-text search routine that preceded the unified Lookup Tool.

In the compilation process, cruncher is only run once for each language, in the stem that creates the known-words cache. In normal operation of the run-time system, it should not be run at all, though it can be called by morphindex if it is run interactively and the user enters a form that is not in the known-words cache. None of the other Morpheus programs is run other than during compilation of the Morpheus database itself.

Notes

1. See the bibliography of Perseus publications.

2. On a standard Perseus development system, this program will be in your path and the necessary environment variables will be set. For more on this see the instructions on compiling the database.

3. Strictly, Morpheus, like all Perseus tools, uses the Perseus subset of beta-code. Unlike true TLG beta-code, Perseus beta-code expects the ASCII letters to be lower case. The subset includes only the letters, accents, breathings, diaeresis, and iota subscript, not any of the markup codes ("escapes" in TLG terminology).

4. The actual files are mdb.md and mdb.mi, making up an Mdb database. "Mdb" here stands for "morphology database," an in-house database format devised for exactly this purpose. Mdb databases are implemented with two files, one for data (.md) and the other for the index (.mi). They are populated with pMDBmaker (or its earlier version MDBmaker), can be tied as Perl hashes with the Mdb_File package, and can be inspected with mdb and mdb_dump.

Text written by Anne Mahoney, September 2004.

	bin	(executables)
	src	(several subdirectories; see below)
morph		Greek
	stemlib	Latin
		Italian