SEAlang Lab

SEAlang Lab: assistive technology for reading, writing, and vocabulary acquisition in complex-script languages

Doug Cooper, Project Director

The SEAlang Lab project (abstract below) is funded 2006-2009 by the US Department of Education's International Research and Studies program.

Prepared learning resources for less commonly taught languages (LCTLs) are rarely sufficient. Nevertheless, broad reading of authentic texts usually helps mitigate the lack of ancillary study or reference materials and even, for some self-directed students, of teachers or native speakers.

But dozens of languages in Southeast, Central, and South Asia and the Middle East (including Arabic, Thai, and Urdu) require complex scripts. They use non-roman alphabets, non-linear ordering, context-dependent letterforms or ligatures, implicit vowels, tones, and registers, and may not even space between words. Students who turn to authentic materials are deadlocked: they must read extensively to acquire vocabulary, yet cannot until they already have vocabulary.

The SEAlang Lab takes a new approach to reading, writing, and vocabulary acquisition for complex-script LCTLs. Our goals are well-defined and achievable; we will build:

the Reader's Workbench: a free, online complex-script reading tool that increases reading speed and accuracy at all levels, using any text (such as Web-accessible newspapers). Besides integrating dictionary and corpus data, the Workbench provides word and phrase segmentation, automates phonetic transcription, highlights core-vocabulary coverage, evaluates text difficulty, and even finds appropriately graded texts for the student.

the Writer's Workbench: a free, online complex-script writing tool. The tool addresses both mechanical ability (using predictive completion from local script, phonetic transcription, or letter-by-letter transliteration), and syntactic / expressive competence (via integrated corpus and collocational reference tools).

the Vocabulary Workbench: a free, online data-driven drill and test tool. Using novel heuristics that draw on dictionary and corpus data, the Workbench dynamically generates review and reinforcement based on student queries, arbitrary texts, or required word lists. It can be coupled to the Reader's Workbench, or used standalone as a teacher's assistant.

The Workbenches are enabled by enriched lexical resources derived from the USED/CRCL - sponsored SEAlang Library by extending the functionality of e-text corpora, and adding word origin, frequency, difficulty, collocational, usage, and core-wordlist data to existing dictionaries.

The SEAlang Lab proposal is submitted by the Center for Research in Computational Linguistics (CRCL Inc., a US 501(c)3 nonprofit). Its design has benefited greatly from wide, public demonstration of our preliminary development, most recently as the invited plenary speaker at the September 2005 Interagency Language Roundtable in Washington, DC.

All tools will be class-tested and improved in collaboration with the nation's best intensive language programs: the Foreign Service Institute, the Defense Language Institute, the University of Wisconsin-Madison Center for Southeast Asian Studies, and the Southeast Asian Studies Summer Institute.

Our working language is Thai. Unlike Arabic or Urdu, Thai has all of the characteristic complex script problems, including lack of word separation. Southeast Asia's strategic importance to the US makes Thai very attractive as a cross-training source or target vis a vis Khmer, Burmese, and Lao, as well as Shan and other ethnic minority languages. Finally, despite its difficulty, Thai has sufficient existing resources and students for timely, cost-effective development and testing.