|
About the SEAlang Library Waray Text Corpus |
This mononlingual corpus consists of roughly one-half million words in literary texts taken from two sources: |
--
a collection of some 204 texts collected and graded by WarayLanguage.org,
and used here with their permission. This set contains roughly 340,000 words.
|
--
461 texts from the Waray Literary Text collection, prepared by the Philippine Languages Online Corpora project
(discussed in dita_roxas2011philippine.pdf),
and accessible via Palito. It includes some 160,000 words.
|
The WarayLanguage.org site has done a considerable amount of work on and with their corpus, including analysis of difficulty levels, and extraction of wordlists organized by both difficulty and semantics. |
The Philippine Languages Online Corpora project has gathered comparable sets of literary and
religious texts for the eight major languages of the Philippines, as well as an extensive set
of sign-language videos.
Usage Try searching for anak. |
- context searches show how the search target appears in context, taking both leading and trailing collocates (or neighboring words) into account. This search returns a merged list of leading and trailing collocates. |
- collocate searches are better for focusing on the search target's immediate neighbor. This search returns separate lists of leading and trailing collocates. |
- merged view allows for fast switching between collocate and context views. Try brief first - downloaded pages may be very large, and a slow browser may fall behind in displaying the detailed view. The Go! button invokes the brief view. |
- raw contexts show the search word in context without any attempt at analysis or sanity-checking (local segmentation that helps ensure that a real word has been found). |
-
restrict collocates requires (or forbids) all collocates to have at least
one sense with a particular part of speech or usage.
Additional tips |
Because the underlying text corpus may be quite large, results may be taken from a random sample of hits. For common words, this means that sample contexts and exact collocate frequencies will vary from run to run. |
Clicking on a word/collocate with the mouse starts a new search:
yellow
searches for contexts, and
black
searches for collocates.
Copyright notices All texts included in these corpora are copyrighted by their original authors, but have been released for use in this context. |
Texts from WarayLanguage.org used by permission. |
Texts from the Philippine Languages Online Corpora project are accessible via
Palito, and were prepared with
support from De La Salle University, the Philippine Federation of the Deaf, and
the Philippines National Commission for Culture and the Arts.
|