TRANSLAT - computer translation of Botanical Latin

Peter D. Bostock

July 2009

Background Appendix

BACKGROUND For some years now, I have been interested in the use of personal computers in botanical research. The obvious uses are mostly well-served by "off-the-shelf" packages; these include methods for taxonomic and ecological data storage and analysis using variously applied data-bases, including DELTA and DECODA, spreadsheets and other statistical tools for the analysis and graphical display of uni- and multi-variate data, and mapping software for analysing and displaying geographically based data.

In contrast, the arena of language translation is generally very poorly served, although it always seemed to me to be a rich field for the application of computers. Bilingual dictionaries are available, in printed form, and increasingly, on computer. These often ignore case, declension and other real-world problems of translation. They also usually ignore the idiomatic nature of language. My interest in botanical Latin was fostered by W.T. Stearn's famous tome, and led me to explore the possibility of translation of the relatively well-structured language as employed in descriptions and diagnoses. This interest has culminated in the computer program, TRANSLAT, described below.

TRANSLAT uses indexed on-disk databases of verbs, adjectives, nouns, pronouns, phrases and adverbs, (including conjunctions and prepositions), to match stems and terminations (flexions or endings), or the whole word, if indeclinable, of botanical Latin words to provide both a literal/figurative English meaning, and an optional associated statement of the `grammar' i.e. the gender, number and case (together with an indication of the mood and tense of verbs or the degree of comparison of adjectives and adverbs).

The translation method is best described as `informed brute force'! The program employs a three-word buffer, to allow one-word look-behind and look-ahead; this allows, for example, for the inverted position of versus, and also facilitates contextual modification of English phraseology (dropping the implied English prepositions with/ by/in, for example, after a Latin preposition governing the ablative). However, the actual translation process is simply one of trying every likely Latin word (and all of its valid endings) until a match occurs.

Each database is interrogated in turn by reference to sets of indexed keys, loaded into memory when the program starts; initial matching consists of only the first letter, or the first and second letters, of the unknown Latin word. The subset of matching database entries is then cycled in an extended matching process on the stem and/or nominative singular (this description is clearly inadequate to describe the processing of verbs!). Each `stem' which matches at this level is then sequentially declined, until a final match occurs on the full unknown word. Shortcuts are available, of course, where the endings are known to be common to all numbers or genders etc.

If no match is found during this process, the program then looks for prefixes (see list below) and a limited set of suffixes (primarily for comparative, superlative and diminutive adjectives and comparative and superlative adverbs) and re-interrogates the adjective database looking for a match.

Commonly-used phrases are also pre-programmed (e.g. plus minusve ), as are abbreviations such as diam., cm., dm. etc. Numbers are ignored. Recognised pronouns include the "doubly-declined" compounds such as quicumque and utercumque (see Appendix). The trailing -que (and), -ve (or) are recognised, as is the particle -ne.

Adjectival prefixes (with a few exceptions) are not recognised if separated by a hyphen, but compound adjectives must be hyphenated viz. hemisphaericus vs. flavo- virescens. The program also recognises those nouns and adjectives which are available only in one number i.e. singular only or plural only.

The processes involved in translating verbs were among the most onerous programming tasks I have ever attempted. Most moods are covered, except the imperative and some parts of the verb infinite (specifically the perfect, pluperfect and future infinite). Gerunds and gerundives are also translated. Anomalous verbs including eo (and compounds), fio, fero , and possum are more or less covered, as are deponents. The impersonal third person usage of certain verbs is recognised. The actual meanings produced for the various tenses are a subset of accepted meanings and may require liberal re-translation on the part of the user of this program! If you require examples of this, and are feeling particularly adventurous, try translating at random any discussion from Ferdinand von Mueller's Fragmenta Phytographiae .

The speed of the program may be increased by foregoing the translation of verbs (although adjectival forms such as present and past participles are always included). A run-time parameter is used to invoke this option (/NOV). I have limited the number of meanings of most verbs to those few (usually 2, rarely up to 8) which seem to be most applicable (the choice was entirely mine) and hence some English translations of verbs will appear very clumsy. An occasional error to which I seem prone (caused by leaving out a hyphen during compresion of the English data as in "lov-e, -es, -ed, -ed, -ing" but stored in error as "love-s, es, -ed, -ed, ing") will give meanings such as "he es" instead of "he loves", or "I was ing" instead of "I was loving"!. Please let me know if you find examples of this.

A conservative estimate of the number of distinct words recognised by the program is in the region of 350,000, but the actual number of words recorded in the databases is currently about 7, 400. The projected figure above does not cover the multiplying factor, almost impossible to calculate, of the prefixes, diminutives, comparatives and superlatives which can be applied to most adjectives.

As far as speed goes, the brute-force method is moderately successful, returning an average translation time per Latin word of about 0.38-0.48 seconds on a 12MHz AT (80286) with no disk-caching and a 24 mS (average access) hard disk. On an 80486DX33 with full disk-caching and a hard disk of about 15 mS access time, the time per word is of the order of 0.07-0.14 seconds, while on a Pentium 133 under Win95, the rate is 20 milliseconds (0.02 secs) per word. TRANSLAT provides statistics on the number of words translated and the average time per word at the end of each input file!

TRANSLAT has been tested under Vista Home Premium, XP to SP3, Win2k, NT4, NT3.51 SP3, NT4.0 SP3 & 4, Win95, WINDOWS 3.1x (with DOS 3.x and above and it should also behave under DR DOS 6.x). It can be run in a graphics window in WINDOWS 3.1, if 386-enhanced mode is used, because the menu system is text- not graphics-based. Some icon files (eg. TRANSLAT.ICO) and PIF files (TRANSBIN.PIF, TRANSLAT.PIF etc) are included for use under Windows. It has also been tested on some Macs using MSDos emulators. Email me for more details if you are interested.

Memory Requirements

TRANSLAT requires about 460kb of memory at present. The program must be run from a hard disk, although the Latin text files could be on floppy disk (not recommended). The Latin input files must be saved in DOS text format. Output from the program is also in DOS text format.

TRANSLAT is also quite well-behaved in extended memory machines under DOS 5.0/6.xx when DOS is loaded high i.e. there is no necessity to resort to the use of LOADFIX. TRANSLAT is written in Microsoft Quick Basic 4.5, with language extensions from Crescent Software (PDQ version 3.0) and TOOLBOX by Mark Goodwin (MIS Press, 1989).

NOTE: TRANSLAT is not able to decline Latin words which are not stored in its databases, although it can make a guess (via option /GUESS). Hence, if you find that Latin words for your special group of plants are not recognised, and your words are correctly spelt, compile a list (including suggested English meanings), or send sample .LAT files, and I will issue updated databases periodically. In a future release, I intend to provide the necessary programs bundled with TRANSLAT to create and modify the databases.

ACKNOWLEDGEMENT: My interest in Botanical Latin, and the stimulus for this program, both arose after I received a copy of the encyclopaedic Botanical Latin by Professor William T. Stearn, and I gratefully acknowledge this fact. Other sources are listed in the program, by pressing the function key F4.

DISCLAIMER: Neither the Author (Peter D. Bostock) or the Author's employers (Queensland Department of Environment and Resource Management) accept any liability should any person incur expense or damage resulting from the use of this program.

APPENDIX

Installation Run-time Considerations Prefixes Compound Pronouns Additional Information about the files

1. INSTALLATION

The programs and data files are installed via a self-extracting Zip file TRANINST.EXE (see below). If a different directory structure is required, use XTree or similar to rename/relocate the files. The only file which needs to be modified, if the default directory name and structure is not followed, is the file TRANSLAT.SET (see below for details). NB the file hosted at Geocities mirror is a simple ZIP file (it costs money to store .EXE files at this site).

The default structure is:

If a drive letter other than C: is used, edit TRANSLAT.SET (with a text editor, as it is an ASCII data file) to reflect the new drive letter. If different subdirectory names are required, similarly edit the relevant entries in TRANSLAT.SET. The PIF files will also need to be altered accordingly.

2. RUN-TIME CONSIDERATIONS

Run the program by typing TRANSLAT at the DOS prompt (or by double- clicking on the icon in a Windows 3.1 group - see below). If the options /NOV (no verb translation), /NOA (no adverbial phrase translation) or /GUESS (guess unknown word grammar) are required, then run the program by typing TRANSLAT /NOV etc, or set up an alternative .PIF file for Windows 3.1 by use of the PIF editor. Standard PIF files are provided - eg TRANSLAT.PIF, LATIN.PIF. Note these are set up for drive C, and directories \LATIN etc.

Previous versions of TRANSLAT and LATIN required a command-line option /BIN to be run on first installation. This is no longer the case.

The menu system is fairly straight forward - it is based on defaults (the double-lined box around one of the choices). The required option on such the menus can be selected by moving the double-line box with arrow keys or by pressing the space bar, and then pressing <ENTER> or the option can be chosen by the highlighted letter (usually the first letter of the word or phrase). The initial information screens allow any key to be pressed to move to the next stage of the program.

The difference between Description and Diagnosis methods is not great - the Diagnosis method attempts to cater for the slightly different usage of the Ablative by modifying the implied "with/in (the)" to "by/in (the)".

The choice between GRA(mmar) and TRA(nslation) output formats is really dependant on whether you require full justification for the program's translation - the GRA output describes in some detail the type of word, its case, gender, number, tense etc, while the TRA output tries to replace the Latin word with its English equivalent(s) without additional padding or punctuation. See 4. below for more information.

Generally, an ESCape keystroke will bring up a box requesting "Quit the program? Y/N", at most points during the initial questioning; the actual translation process itself can be interrupted by pressing CTRL+C ie Press and hold Control key, and press key C. The same "Quit" box will be displayed.

3. PREFIXES IMBEDDED IN TRANSLAT

aequi, atro, austro*, bi*, crassi, di, extra, e, ex, hemi, hypo , infra, intra, in, multi, pachy, palaeo , pauci, per, pinnati, pluri, poly, prae, pseud, pseudo*, quadri*, quadr, quinque, quinqui, quinqu, semi*, sesqui*, sub, supra*, tripli, tri*, uni*.

Those prefixes marked with an asterisk will also be recognised if separated from the associated word by a hyphen.

In additiion, the following compounds (acting as prefixes) will be recognised only if followed by a hyphen: porphyr-, porphyro-.

4. COMPOUND PRONOUNS RECOGNISED BY TRANSLAT

aliqui, aliquis, alteruter, ecquis, quicumque , quicum, quidam, quilibet, quisnam, quispiam , quisquam, quisque, quisquis, quivis, uterlibet , uterque.

NOTE: compounds between adjectives and pronouns are not covered - in particular unusquisque - you may get around this by entering the adjective separately viz. unus quisque or uno quoque etc.

5. ADDITIONAL INFORMATION ABOUT THE FILES

TRANZIP.EXE - Windows self-extracting file, by default set to C:\, will correctly set up the folder \LATIN etc.

TRANINST.EXE - self-extracting ZIP file (PKUnzip 2.04g) (compatible with all flavours of WinZip etc). Run this `program' with the mandatory parameter "-d" (for full directory structure), the optional parameter "-f" (to freshen existing files) or -o (to overwrite all files) and finally, the destination drive "D:\" or "C:\" (place on drive D or C respectively) as required.

e.g to install on D, run as follows: A:\>TRANINST -d -o D:\

OR C:\>A:TRANINST -d [-f] D:\

(this gives subdirectories D:\LATIN\, D:\LATIN\DATA\ etc).

OR copy TRANINST.EXE to root directory ie C:\, then

C:\>TRANINST -d -o

(this version gives installation on C:)

TRASMALL.EXE - same as TRANINST.EXE but lacking 2 large files: ENGLISH.TXT and EXAMPLES.EXE

MISCDATA.EXE - the missing files from TRASMALL.EXE. These unzip in the same way as for TRANINST.EXE.

TRANSLAT.EXE - the program itself!

Run TRANSLAT /? or TRANSLAT /H for a help screen (identical to the initial welcome screen when running the program normally).

TRANSLAT.SET - details the path and name of the data-files and the default location of *.LAT, *.GRA and *.TRA files (collectively known as TEXT files to TRANSLAT). This file is user-editable, although the numbers on the last line must NOT be altered unless you have made changes to ADVERBS.DAT, PHRASES.DAT or ENDINGS.DAT. Change "C:\" to "D:\" etc if you are using a different hard disk letter, and alter the second last line if you prefer a default sub-directory other than C:\LATIN\TEXT for the storage of your .LAT files. NB: This file must be in the default directory i.e. preferably the same one as TRANSLAT.EXE. It will not be found if placed in another directory, even if that directory is in the PATH statement of AUTOEXEC.BAT.

Sample as supplied (ignore the list dots!):

[sequence of the above numbers is: nouns, prepositions, adverbs/conjunctions, adjectives, verbs, verb-stems, phrases, endings (/Guess command), pronouns.

ADJ.EXE - a utility program to display the contents of the ADJECTIV.DB database, including full declensions. To pull down a menu (either "Show declined word" or "Quit Program"), hold the Alt key and press either S or Q. An escape keypress also exits the various levels of the program. This program expects to find ADJECTIV.DB in C:\LATIN\DATA subdirectory - if this is not the case, a command line entry must be used giving the new path:

e.g. >ADJ D:\LATIN

LATIN.EXE - program to aid translation from English to Latin. The program declines Latin words (using the same databases as TRANSLAT.EXE). Although the user has to determine the basic Latin word (e.g. nominative singular for a noun, nom. sing. masc. for adjectives and pronouns, and 1st person present indicative active for verbs, the program supplies the correct ending as prompted by the user. It has the advantage of remembering previous usage, and setting defaults accordingly. Try it - LATIN /BIN on first run only, subsequently just type LATIN. The program prompts for a file name to use for the Latin output. Remember pressing the Escape key at any stage during the menus will either cancel a current operation, or prompt a "Do you wish to exit?" question. Requires approx. 400kb free memory to execute successfully..

*.LAT - user-entered text files (MUST be in DOS Text format). At least one Carriage Return/Line Feed combination must be present in such files. See EXAMPLES.EXE (a self-extracting PKZIP Version 2.04g zip file) for examples. Extract files from EXAMPLES.EXE simply by typing `EXAMPLES -d C:\' at the DOS prompt, when logged into subdirectory \LATIN\TEXT. Files can also be viewed in EXAMPLES.EXE by typing EXAMPLES -v at the DOS prompt.

*.GRA - TRANSLAT-produced translation file (DOS text format), with one Latin word per line, and associated meanings (including abbreviations for part of speech, case/number/person/tense etc) on the same line, separated by % symbols. A <TAB> precedes the first % symbol of each new word.

*.TRA - TRANSLAT-produced brief translation file (DOS Text format), lacking the case/number/person indicators which are present in the .GRA files. This file contains Line Feeds in accordance with the input .LAT file.

*.TRS files - these are binary images of QuickBasic arrays, mostly involved in indexing the database (*.DB) files. Do not attempt to change them! They are created during /BIN runs. Similar files (called *.TBL) are produced by LATIN.EXE during its /BIN process.

LATDECLN.DAT - endings for nouns, adjectives, verb forms and pronouns. NOT user-editable! Make Read-only for safety.

PHRASES.DAT - details the 2-word phrases which are recognised by TRANSLAT. It is user-modifiable, but remember to change relevant entry (count of items) in TRANSLAT.SET.

ADVERBS.DAT - prepositions (first 53 entries) followed by adverbs, conjunctions and indeclinable words. Note the first 53 entries must be in alphabetic sequence, followed by the remaining words in alphabetic sequence. Only one entry is allowed for each word in each part of the file, although prepositions (1-53) can be duplicated with alternative meaning in the adverb/conjunction area (54 onwards). Again, user editable (remember to change the file TRANSLAT.SET if necessary).

*VERBSTEMS.DAT - index file for verbs - NOT user editable.

*.DB files - database files for the storage of nouns, adjectives, adverbs and verbs. Again, these can only be altered by programs not supplied with this version of TRANSLAT.

Return to Home Page