Preparing the Myanmar-English Dictionary for on-line use requires some reanalysis and extension of existing dictionary data. Among other things, every compound word is segmented into its component headwords. This lets every headword lookup also return all compounds that use the head. In contrast, the underlying dictionary only lists compounds that begin with the current headword; thus, houseboat appears under house, but not boat.
Here, we describe some of the remaining problems. The most significant issues entail:
segmenting compound-word entries into their component headwords, and
linking each component to the etymologically correct headword entry.
We have individually checked, and successfully resolved, some 20,000 compound words. However, some 554 compounds could not be segmented into headwords accurately, and another 1,377 terms found within compounds could not be linked to proper headwords. Extending the Myanmar-English dictionary to include these headwords remains an important, unfinished task.
Below, we briefly describe the types of problem we have encountered, and note the tag (in bold) used to mark each. The items themselves are collected in:
unsure-seg.htm could not be segmented properly.
unsure-ref.htm could not be properly linked to headwords.
1. Problems in linking to etymologically correct headwords
x (unsure) – not clear if the segmentation is correct:
ယဥ္က္ယေး က္ယေး (polite; courteous; gentle; well-bred; civilized. )
ငုိခ္ယင္း ခ္ယင္း (plaintive piece of poem or prose; dirge; threnody; song of lamentation; wailing song )
a (add) – segmentation is correct, but a necessary headword is not in the dictionary:
လူစား = လူ + စား (စား from အစား (kind;class;type)
2. Problems in segmenting compounds into proper headwords
x (unsure) – unsure of proper segmentation:
တံပုိး = တံ + ပုိး? (horn (musical instrument) )
ယဥ္က္ယေး = ယဥ္ + က္ယေး? (polite; courteous; gentle; well-bred; civilized. )
s (split) – an unanalyzed headword (like helter-skelter) is decomposed and reconstructed (e.g. not-helter-not-skelter):
မရုိမသေ = မ + ရုိ + မ + သေ (ရုိသေ = respect (headword), မ = negative.)
r (rhyme) – compound includes a rhyme that may or may not be a headwords:
ခ္ယဥ္ဖ္ရုံးဖ္ရုံး = ခ္ယဥ္ + ဖ္ရုံး + ဖ္ရုံး (ခ္ယဥ္ = sour)
c (combining) – one element of the compound is the combining form of a headword (usually of Indic origin):
ပတ္တလား = Sans ဝာဒ္ယ+Myan တလား (ပတ္တလား = xylophone )
မ္ရ္ဝေလမ္ပာယ္ = မ္ရ္ဝေ + (အ)လမ္ပာယ္ ( အ taken out) (မ္ရ္ဝေလမ္ပာယ္ = snake charmer. )
l (loan) – compound includes a Pali/Sanskrit word, or Pali/Sanskrit + Myanmar combined word, that is not a dictionary headword (e.g. as English tele- might not not be a dictionary headword):
က္ယမ္းဂန္ = က္ယမ္း (Myanmar) + ဂန္ (Pali) (က္ယမ္းဂန္ = literary work of classical standard )
m (modified) - one element of a compound has been modified; e.g. by losing a tone mark:
ပ္ရည္သူ့ကောင္စီ = ပ္ရည္သူ + tone mark + ကောင္စီ
a (absent) – the word should reasonably be a dictionary headword.