SEAlang Library:  Programmer’s Guide

(preliminary guide, Summer 2006)

 

The SEAlang Library is designed to provide both interactive Web page-based interfaces, and queryable program resources.  The basic system architecture is:

   SEAlang Library data resources

      mostly flat XML files

      some pre-built hashes / in-memory filesystem

      built for comfort, not for speed’

(which support)

         the SEAlang Library query server:

             http://sealang.net        

(which handles)

               language-specific SEAlang Library web page user interfaces     

                    http://thai.sealang.net                          

                    http://lao.sealang.net , etc                   

(as well as)

               program-generated queries

                   http://sealang.net/dictionary?lang=thai&ipa=baan& …

                   http://sealang.net/corpus?lang=thai&orth=บ้าน&...

 

Web services are provided by a dedicated Linux server (P4 3.0 Ghz, with 2 G ram), using Apache / mod_perl.  All cgi and server scripts are written in Perl.

Data resources are held as flat XML-tagged text files.  Tagging generally follows the Text Encoding Initiative (TEI) P5 standard; variations, if any, do not violate its spirit.  In some cases where server performance is an issue, data has been extracted, built into (Perl) associative arrays, and saved in a (Linux) in-memory filesystem.  From the web server point of view, we find that this provides practically the same performace as a persistent mod_perl solution, but is easier to maintain.

     We do not use any database software per se.  Our primary goal is to builld datasets that are comprehensible, easily maintained, readily extended, and open to access in unanticipated and innovative ways.  Although we want fast performance for basic operations, scaling up (i.e. maintaining performance under load) isn’t really an issue.  There’s just not that much data, and we don’t really expect heavy, simultaneous user demand.

Query servers respond to simple http requests generated both by SEAlang Library web pages, and by anonymous programs.  Data is returned as XML-tagged text for display or further processing.  Although we considered using more elaborate protocols (e.g. SOAP), these appeared to have little benefit – and very definite disadvantages in terms of complexity – in attempting to build a user community for our data.  In contrast, the ‘RESTful’ approach ­of http requests with query arguments (as attribute=value pairs) is easy to understand and work with.

Query services will be formally defined and published over the life of the project.  At present, our own web pages rely on a rough set of common-sense attributes, e.g.:

Attribute

       Values

  language

Thai, Lao, Khmer … (derived automatically for Unicode languages)

  orthography

a query string in local orthography

  phone

query string in conventional romanization

  definition

query string for reverse search

  service

dictionary, corpus, bitext

  match

syllable, word, complete ... (part of the dictionary service)
context, collocate … (part of the corpus service)

  return

gui, result, data … (i.e. the full GUI, the framed result page, other data)

 

and so on.  We anticipate beginning to publicize these as soon as SEAlang has data for its first few scheduled languages.

Web page services provided by the SEAlang Library website play a dual role.  First, they manage user queries, providing initial error and sanity checking, and massaging them into the forms required by the query servers.  As part of this task, they also handle error interpreting and reporting.  Secondly, they manage query server responses as necessary, either by building an XHTML wrapper that refers to an appropriate CSS file before including the returned XML data, or by repackaging it in some way.