Random Contexts
Text corpora in the SEAlang Library may be very large (well in excess of 100 million characters). Thus, searches can easily yield tens of thousands of hits. |
In general, relatively few examples (100 to 1,000) will demonstrate most of a word's 'interesting' behavior in context. The SEAlang Corpus will find all appearances of a word (with upper limits on hits and CPU time), then subsample at random to yield random contexts hits (default 1,000). |
One intentional consequence of this is that each search will produce a different sample. Thus, the exact frequency figures (and in some cases, the order) of collocates may vary. In addition, the sample contexts shown for common words will certainly vary. |
Note that the examples (default 5) are also drawn at random. |