This site is open for test purposes only. Not all functions work yet, and corpus content content will change. |
About the SEAlang Library Vietnamese Text Corpus |
This mononlingual corpus consists of Vietnamese texts published on the Internet, sampled here for research and educational purposes. We are using a combination of newspaper, literary, and Wikipedia texts. |
- context searches show how the search target appears in context, taking both leading and trailing collocates (or neighboring words) into account. This search returns a merged list of leading and trailing collocates. |
- collocate searches are better for focusing on the search target's immediate neighbor. This search returns separate lists of leading and trailing collocates. |
- merged view allows for fast switching between collocate and context views. Try brief first - downloaded pages may be very large, and a slow browser may fall behind in displaying the detailed view. The Go! button invokes the brief view. |
-
raw contexts show the search word in context without any
attempt at analysis or sanity-checking (local segmentation
that helps ensure that a real word has been found).
Usage tips |
Because the underlying text corpus may be quite large (more than 50 million characters in this implementation), results may be taken from a random sample of hits. For common words, this means that sample contexts and exact collocate frequencies will vary from run to run. |
Clicking on a word/collocate with the mouse starts a new search:
yellow
searches for contexts, and
black
searches for collocates.
|
Look for continuing development of SEAlang Library Vietnamese resources. |