Munda Comparative Dictionary

About the SEAlang Munda Etymological Dictionary

The Munda Languages Project's primary resources are this Etymological Dictionary, built to support work in comparative and historical linguistics, and a companion Languages Database devoted to preservation and sharing of language and lexical resources. Please read the MKED Tutorial and Cookbook if you're a first-time user.

Organization Data are obtained from a range of sources, including:

etymological dictionaries that include proposed proto-language reconstructions,

comparative dictionaries that group citations into etymological or semantic sets, but do not propose reconstructions,

"linguist's lexicons" that provide careful phonemic renderings and brief glosses, and

ordinary dictionaries that may or may not include phonemic rendering.

Project technology, including the underlying software architecture and documentation, are based on the results of the Mon-Khmer Languages Project (CRCL / NEH 2007-2011). We are pleased to collaborate with Paul Sidwell and the ANU on the Proto-Austroasiatic Lexicon Project (Sidwell / ARC 2012-2016). This will result in complete reconstructions for proto-Austroasiatic and its branches and sub-branches (Sidwell), and an AA languages website with gold-standard comparative datasets (CRCL).

Capabilities The Etymological Dictionary provides four basic functions:

searching data, based on phonemic, orthographic, or semantic queries,

organizing results into comparative or historical sets,

restricting searches, based on language and/or source,

naming datasets and individual items for citation and reuse.

Developing reliable mechanisms for on-line collaboration is a central project goal. New datasets of citations, reconstructions, relations, and comments are welcome, and are readily added to the database. However, all datasets are individually identified: every user can easily decide which sets to include or exclude from searches.

Data entry and indexing As noted above, data sources are inconsistently organized. We make every effort to:

expose data for searching, e.g. by expanding bracketed reconstructions. For example, a head that is originally listed as *b[h]raap may be searched as braap or bhraap.

extend queries in a manner that meets the user's intention. For example, unvoiced consonant variants are automatically included in searches, as are breathy, creaky, dipthonged, or long vowel variations. This behavior can be overridden.

preserve non-explicit information from original sources. For example, dialect identifiers, glosses, and derivational relations may be inferred, or phonemic equivalents may be added.

Experimental features The Munda Etymological Dictionary is an experimental laboratory as well as a working resource. Our concerns include:

community development Discussing dataset content and analysis is critical to the linguistics community. We are seeking to discover and define the middle ground between the overly restrictive methods of the past (passing manuscripts from hand to hand), and the unconstrained Wiki-style publication seen today.

query specification Historical language change and inconsistent data quality can make formulating useful phonemic queries extremely difficult. Our work on IPA query builders and both phonemic and notational approximation are intended to help account for language drift, and variations in research practice.

database design The underlying design of the Munda database is extraordinarily simple: it contains only citations, reconstructions, comments, and relations. We wish to see if this experimental approach will continue to allow us to manipulate and extend the database, while preserving the rich web of relations that characterize comparative language data.

Please click to read the MKED Tutorial and Cookbook if you're a first-time user.