Tagalog Text Corpus

About the SEAlang Library Tagalog Text Corpus 
This corpus contains more than two million words taken from:
 - examples extracted from the Ramos Tagalog-English Dictionary,
 - texts from the Tagalog Literary Text collection, prepared by the Philippine Languages Online Corpora project (discussed in dita_roxas2011philippine.pdf), and accessible via Palito. It includes some 200,000 words.
 - various Internet sources, including material located by Kevin Scannell as part of his work on corpus building for minority langauges, An Crúbadán.
Try searching for bata.
 - context searches show how the search target appears in context, taking both leading and trailing collocates (or neighboring words) into account. This search returns a merged list of leading and trailing collocates.
 - collocate searches are better for focusing on the search target's immediate neighbor. This search returns separate lists of leading and trailing collocates.
 - merged view allows for fast switching between collocate and context views.  Try brief first - downloaded pages may be very large, and a slow browser may fall behind in displaying the detailed view. The Go! button invokes the brief view.
 - raw contexts show the search word in context without any attempt at analysis or sanity-checking (local segmentation that helps ensure that a real word has been found).
 - restrict collocates requires (or forbids) all collocates to have at least one sense with a particular part of speech or usage.
Additional tips
Because the underlying text corpus may be quite large, results may be taken from a random sample of hits.  For common words, this means that sample contexts and exact collocate frequencies will vary from run to run.
   Clicking on a word/collocate with the mouse starts a new search: yellow searches for contexts, and black searches for collocates.
Copyright notices
Tagalog Dictionary was prepared by the Pacific and Asian Linguistics Institute (PALI) of the University of Hawaii pursuant to Peace Corps contract PC 25-1507.  An edition of this work was published as part of the PALI series in 1971 by the University of Hawaii Press. 
Texts from the Philippine Languages Online Corpora project are accessible via Palito, and were prepared with support from De La Salle University, the Philippine Federation of the Deaf, and the Philippines National Commission for Culture and the Arts.