Southeast Asian Languages Library Cataloging Tools
Library cataloging of Southeast Asian texts follows two distinct
paths, relying on local orthographies within Southeast Asia, and
on romanization elsewhere.
The SEAcat tools serve three purposes:
-
assist in generating accurate and consistent Library of Congress
romanization from local orthography;
-
allow searches of romanized records using queries in local orthography, and
provide more sophisticated tools for searching SEA records in general;
-
assess the feasibility of converting existing romanized records back
to their original SEA orthographies.
About Library of Congress Romanization
The most widely used (and in many instances, required) approach to
romanization is the
system approved by the US Library of Congress and American Library Association,
and published as the
ALA-LC Romanization Tables: Transliteration Schemes for Non-Roman Scripts
(1997). Scans of most pages are available at
www.loc.gov/catdir/cpso/roman.html, and
copies of mainland SEA sections are linked on the left.
Unfortunately, as defined for the mainland SEA writing systems,
the ALA-LC system has serious shortcomings.
First, although all four writing systems are based on the same underlying
Indic script design, the ALA-LC romanization is not.
Although there is some overlap,
different symbols are used for each SEA orthography; e.g.
ṅ vs ng, or
œ vs oe.
This has undermined the development of regional tools and expertise.
Secondly, by design, the romanization is neither strictly phonetic nor
orthographic.
It is extremely difficult to read sensibly (for example,
the Lao and Thai systems do not indicate tone).
This is not a problem as long as the ALA-LC romanization is used for
its intended purpose as a convenient notation for cataloging.
However, complex implementation rules undermine the system.
Thai has more than a dozen pages of rules that determine when and where
words should be divided, as though the cataloging information
were going to be read like ordinary text.
These rules do little to make the text more understandable; rather, they
create countless opportunities for inconsistent cataloging.
In the end, the system is neither readable nor reliable.
The best solution will be to revert catalog entries back
to the original Southeast Asian orthographies.
This cannot be done automatically, because ALA-LC romanization is a
lossy, many-to-one system.
On the other hand, we can develop tools that can greatly
improve the productivity of human back-translation.
The SEAlang SEAcat Tools have been thoroughly tested with Firefox 1.5
(Windows XP and Linux 2.6.14), Netscape 7.2 (Windows XP),
and Safari 2.0 (Apple OSX 10.4).
Browswers that do not comply with W3C standards
(in particular, Microsoft Internet Explorer) are not supported,
and the Library resources will not display properly with them.