Mon-Khmer Comparative Dictionary Tutorial

Mon-Khmer Etymological Dictionary Tutorial

The MKED is a large, custom-designed database. It rethinks everything, from the way that queries are formulated to the way that results are organized and returned.

This tutorial introduces the MKED controls. Please take a few minutes to learn to distinguish between what it:

always does (such letting m̥ equal hm),

sometimes does (like letting vowels match breathy, creaky, diphthonged, or long variants in a normal search, but not an exact search),

can be told to do (like only searching certain sources).

Windows The Dictionary has three windows:

search (upper left) Search phonology, orthography, or glosses; find individual items by their ID numbers, and control result display.

results (upper right) The results of any search go here.

choosing / mapping / etc. (bottom). This is where complex queries are formulated, and additional information is provided.

As a rule, clicking in any of the three windows will restore it to full size.

The search window
Let's look at the search, return, and layout options briefly before discussing searches. These checkboxes are useful in narrowing or expanding a search:

reconstructions and/or actual ...
citations can be searched independently.

These radio buttons help control how much data is returned. The line-item only returns the least data, while the full entry brings back the most.

reconstruction to reflex Return the matched item and its parents.
reflex to reconstruction As above, but reversed.
full entry Return any reconstruction and all citations.
line item only Return the single matched item, sorted per request

Not all data has been organized into proper etymological families yet. As a rule, comparative dictionaries with reconstructions (like Shorto 2006, Diffloth 1980, or Ferlus 2xx7) are internally linked), but ordinary dictionaries (like Man 1889 or Milne 1931) are not. Some simple comparative dictionaries or wordlists (like Huffman 1971 or Banker 1979) have 'dummy' parent entries to allow basic grouping.

layout / sort by These options let the returned data be sorted by ID, IPA, gloss, or language.

Shorto MCKD manuscript entries
Use the and buttons to see a full entry as published in Mon-Khmer Comparative Dictionary (Shorto 2006), or as found in his original manuscript. Shorto's original entry number (e.g. 11 in Sho2006:X:11) is required. Viewing the manuscript image requires the free DjVu plugin

The results window
Citations are returned here, organized by branch. Generally, entries will be collapsed to make it easier to get an overview. Click the colored entry heading (e.g. a branch name), or the / buttons to expand the entries, or collapse them again.

Every entry has two small buttons on the left:

X hides the entry completely. It can be shown again by collapsing, then re-expanding, the branch.

S saves a copy of the entry (and colors it to remind you that it's been saved). Clicking s will open a small popup window that shows the entry, and has room for a comment. The contents of the window can be named (e.g. "pig-reflexes.htm") and saved.

The chooser tabs
The window at the bottom of the MKED screen has a number of tabs (try looking at them now). They include:

choose languages Set branch / subbranch restrictions on searches.

choose IPA chars Form phonemic search queries with IPA characters.

choose sources Include or exclude specific sources and languages.

help A summary of query formation rules.

history Details about previous queries.

issues Brief discussion of known performance issues.

map results Map that shows distribution of returned items.

Choose languages The map reflect all the data in the MK Etymological Dictionary. You can click the colored pins to explore resouces language by language. This includes viewing the entire underlying dataset.
On the right of this tab is a table of individual branches and subbranches. These can be clicked to specify the range of any search. To pick single languages, use the "choose sources" tab instead.

Choose IPA / build phones Most interesting searches allow variation in the realization of a given phone. An allophone set can be specified using standard square bracket set notation.

Likely allophone sets can be selected in the build a phone tool. Just move the mouse around the labels, and you'll see the sets light up. Clicking on any single character copies it. Clicking on the background of any box in the IPA chart selects all the letters in that box.

Note that these are all single phones - not full syllables:

[pbɓpʰ] -- bilabial plosive set
[uʊoɔɒ] -- any back vowel
[tdɗtʰn] -- allow devoicing, and lenition of dental stop to nasal
[hs] -- allow lenition of final /s/
[pbɓpʰ] -- allow voice-quality changes in initial
[eɛæɪiəɐʌɤ]ɲ -- allow vowel raising and/or fronting conditioned by final
[ɲnŋ] -- allow shift of articulation of the final nasal
[cɟʄcʰçθsʃ] -- palatal~fricative initials

Click to copy the set to the IPA text entry in the search window with the set brackets in place.

It is important that you only create a single phone using the build tool - not a whole syllable.

Limit the search (not completely implemented) The limit tab lets MKED operation include or exclude sets of sources, languages, and contributors. It allows four possibilities:

Search only ... specify one or more sources to include.

Exclude only .. specify one or more sources to exclude. The default setting lets a subset of sources be specified (with search only), while still returning non-specified direct ancestors or descendents.

Language restriction ... choose specific branches or languages for inclusion.

Reanalysis ... the MKED encourages data addition, reuse, commentary, and reanalysis. This control limits the consideration of such material.

Phonetic searching
IPA searches are intended to accommodate the research linguist. Below, we discuss the system's default approximations, the distinction between normal and exact searches, and the syntax of regular expressions.

default query expansion As a rule, tone marks and diacritics (except for those indicating breathiness or creakiness) are always ignored. This relects a slight compromise - some dictionary data is in phonetic, rather than phonemic form, and would be difficult to locate without ignoring these marks (for the time being).
Several approximations are always made:
- script and regular 'g' (ɡ / g) are equivalent.
- ordinary 'h' always matches IPA ʰ in specifiying aspiration.
- IPA always matches traditional notation for unvoiced consonants (m̥ / hm).
- ordinary ':' always matches IPA 'ː' in specifying long vowels.
- V matches any vowel (including breathy, creaky, diphthong, and long variations).
- X matches any consonant (including voiced and aspirated variations).
- a|b allows and|or queries in text searches only.

normal vs. exact search The two search buttons differ in the way they make certain substitutions generally associated with vowel quality. Similar 'normal' approximations may be added if they are useful:

In a normal search:

breathy, creaky, diphthong, and long vowels need not be explicitly specified. They are automatically matched. (However, the long vowel mark : is obeyed.)

In an exact search:

B following a vowel requires breathiness.

C following a vowel requires creakiness.

D following a vowel requires a diphthong (try normal bin vs. exact biDn)

L following a vowel requires a long vowel (doubling the vowel works as well).

regular expression syntax The MKED allows these regular expressions in searches:

[abc]	allow any member of the set a, b, or c.
?	zero or 1 of the preceding.
[abc]? or a?	the set or letter is optional.
*	zero or more of the preceding.
.*	anything; zero or more letters.
[abc].* or a.*	the set or letter followed by anything.
*.[abc] or .a*	the set or letter preceded by anything.
a\|b	and\|or in text search only

As noted earlier:
- V matches any vowel (including breathy, creaky, diphthong, and long variations).
- X matches any consonant (including voiced and aspirated variations).

Orthography searches
Several data sources (e.g. Vietnamese, Khmer) provide standard orthography for all entries. These can be searched; however, we have not yet implemented special support for this - your milage may vary.

Text searches
Text searches are limited to glosses. English query words can be automatically expanded to match derived or inflected forms, e.g. burn matches burn, burns, burning, burned, and burnt by clicking the expand box. As noted above, and|or searches can also be specified (and will be properly expanded). Try rice|paddy.

ID searching
Every item in the database has a unique identifier, composed of:

an authorID of the form XxxNNNN

a type, which is R(econstruction), C(itation), L(ink), or N(note),

a number, which reflects the source layout or numbering.

Any item can be searched by its ID. Alteratively, setting only the authorID is an effective way to limit the search to just one source.

Source data modification
Entry indexing Brackets as used in typical reconstructions are implicitly expanded. Thus, all of the possibilities implied by typical notation for reconstructions can be searched directly. For example, a head that is listed as *b[h]raap can be searched as either braap or bhraap.

Inferred glosses Glossing is inconsistent in comparative dictionaries. It is not uncommon for either reconstructions or citations to be left unglossed. To fill in the blanks, we reproduce the placeholder gloss in parentheses. When more than one gloss is relied on (as in Diffloth1984, where a proto Monic gloss is formed from the Mon and Nyah Kur citations), a double-slash "//" separates them.

Inferred dialects As a rule, we attempt to assign each language a dialect marker. Usually this will be a village or regional name, but on occasion a particular author may be strongly identified with a particular language variety, and lends his or her name to the dialect.

Font requirements
IPA input and display prefer Charis SIL or Doulos SIL for proper rendering.