Fixed search: don't resegment the search query.
Robust search: automatically generate all alternative segmentations before searching the title/author corpus.
Robust & rough search:
also tries every combination. In addition, it will ignore
MARC diacritics (which usually show vowel length). This lets us
find records that have not been catalogued following ALA-LC guidelines.
-
long vowels (e.g. ā) match short ones (a);
-
ư matches u;
-
o̜ matches o;
-
ʻ is ignored;
-
æ matches ae;
-
œ matches oe;
Return links / html: "Links" is a very compact form; while "html" prints the complete record.
Romanize for interchange: use character components (e.g. diacritics like dots and macrons) rather than precomposed characters. This follows the MARC21 interchange standard, and makes it easier to cut-and-paste SEAcat output.
Romanize for looks: use precomposed characters whenever possible. This will greatly improve text appearance in most fonts. However, a font that is designed to handle diacritics properly (like Doulos SIL) will usually render both components and precomposed characters properly.
Check word breaks: automatically generate all alternative segmentations before searching the title/author corpus. Return counts only.
Check pronunciation: if possible, check native orthography against an authoritative reference. This is particularly helpful for SEA words of Indic origin (these are not always nativized to the same extent).