Thai Lexicography Resources

Thai Lexicography Resources
Maintained by Doug Cooper (bugs to doug.cooper.thailand at gmail.com)
Center for Research in Computational Linguistics, Bangkok http://crcl.th.net

 
Proto-Tai'o'Matic

     
English match full word     Thai match start end of word
Results must include at least one of:   Li   Luo   Brown   Jonsson
Allow additional information from:   Li   Luo   Brown   Jonsson
Include required item even if modern Thai reflex is unknown
Compress output (one line per entry)

Sort results (Applies to Li only unless noted; all others sorted by Thai head)
  number   Li section number   Jonsson item number  
  word   Thai   gloss  
  initial   PT   PSW   modern  
  vowel   PT   PSW   modern  
  final   modern (others derive directly)  
  tone   PT  
  other   PT vowel rule     show sort keys for debugging
The Tool: 
Proto-Tai'o'Matic merges, searches, and extends several wordlists and proposed reconstructions of proto-Tai and Southwestern Tai. These include [Li77], [Brown65a], [Jonsson91], and [Luo97]. Details of each data set are discussed below.  See intersection stats.
   Proto-Tai'o'Matic is primarily intended to be a resource for studying the etymology of modern Thai. It also provided an interesting test bed for some research and implementation problems (eg. generating Li's reconstructions, doing approximate phonetic and semantic matching in merging the lists).
   While the tool is useful for study related to proto-Tai reconstruction, it doesn't incorporate all possible data that might be garnered from the sources listed. I'd be more than happy to include this (or additional) data if supplied.
Usage tips: 
- Searches allow either English or Thai, detected automatically.
- To search by author, do not provide a search key.
- To list all of an author's entries, choose only that person's must include, as well as ... even if modern reflex is unknown.
- To compare two authors, either select must include for both (returns every entry from both), or select must inlude for one, and allow... for the other (every entry for the first, but only matching entries from the second).
Data: 
[Li77]  Li provides glossed Thai phonetics, along with a proposed reconstruction of each word's initial consonant(s) and tone. I've followed Li's rules and expanded his data to provide complete proto-Tai and proto-Southwestern Tai reconstructions. For example:
      »Õ¹  piin "to climb" 
            Li: [ * (A1) 4.1:60 (rule 14.3.4) ]   PSW: *piin   PT: *pi«n
In this case, the item is found in Li's section 4.1, page 60.
   About a third of Li's entries include proposed proto-Tai vowels. Finals are not supplied, consistent with Li's comment that 'the final consonant system is kept in most dialects with very little change' (p. 58).
   As a rule, the proto-Southwestern reconstruction can be assembled umambiguously from the information Li provides explicitly, using a combination of his reconstructed initial, and the modern vowel and final.
   The proto-Tai reconstructions, in turn, can almost invariably be completed on the basis of proto-Tai initial and modern vowel by following rules Li lays out in the text. About half can be derived directly, while the remainder (about 400) usually require inspecting a modern representative of Northern Tai like Po-ai.
   Proto-Tai'o'Matic uses a combination of automatic generation and hand inspection. All inferred proto-Tai vowels are labled with their derivation rule (the number of the section the derivation is described in). If the exception rule can't be tested (usually for lack of Po-ai evidence), we go with the default (because when the rule can be tested, the default is usually correct).
[Brown65a]  Brown's modern Thai wordlist appears on pages 215 - 221, with a few subsequent additions. Brown began with all words that appeared in at least three of the five major dialect areas he studied, identified as Lao, Shan (speakers from near Chiang Rai and Kaen Thao), and Northern, Central, and Southern Thai. His 1,387-word list meets the requirement that at least one of the Shan dialects be represented, while a narrower 818-word list requires that Chiang Rai Shan be included.
   Brown's list is given in Thai, with occasional glosses. In some cases, his intended sense is not clear; the glosses we provide for these items are marked as conjectured conj. Words not found in the Chiang Rai Shan list are marked no Shan cognate. There is no attempt to reconstruct Brown's 'ancient Thai.'
[Jonsson91]  Jonsson reconstructs Southwestern Tai, as defined by Li, on the basis of cognates from Ahom (A), Khamti (K), Shan (S), Lue, Tai Neua (TN), Siamese/written Thai (W), Lao (L), White Tai (WT), Black Tai (BT), and Red Tai (RT).
   Jonsson's list is given in phonetics, with specific Thai hints. I've retained her Appendix B item numbers, and have added the starting page of each sublist; eg. B2-29 refers to item 29 from the sublist that begins on page B2. Jonsson's gloss of the proto form is included.
[Luo97]  Luo's Reonstructions of New Cognates in Tai: a Supplement to Li77 is found in Appendix 1, pp. 235 - 314. He provides Thai phonetics with English glosses; I was not able to identify about forty of the modern Thai forms.
   In this implementation, Luo entries are marked only by their English glosses - reconstructions are not included.
References
[Brown65a] Brown, J. Marvin (1965) From Ancient Thai to Modern Dialects. In From Ancient Thai to Modern Dialects, and Other Writings on Historical Thai Linguistics, pp. 69-254. White Lotus, Bangkok. ISBN 974-8495-07-8.
[Jonsson91] Jonsson, Nanna L. (1991) Proto Southwestern Tai. Ph.D dissertation, available from UMI.
[Li77a] Li, Fang Kuei (1977) A Handbook of Comparative Tai. University Press of Hawaii.
[Luo97] Luo, Yongxian (1997) The Subgroup Structure of the Tai Languages: a Historical - Comparative Study. Ph.D dissertation, published as Journal of Chinese Linguistics, Monograph Series Number 12, 1977. Project on Linguistic Analysis. ISSN 0091-3723.
Thanks
I would appreciate any suggestions or corrections. I'd be happy to make page contents available in a programmer-friendly format (ie. as labeled items) if somebody will help me verity their accuracy first.
   Thanks to Namfon Buntua, Nuusai Inthimaat, Chris Court, Gong Qunhu, and Stephen Morey for their contributions to this project. Any errors are, of course, my own.