SEAlang Library - Random Contexts Help

Random Contexts

Text corpora in the SEAlang Library may be very large (well in excess of 100 million characters). Thus, searches can easily yield tens of thousands of hits.

In general, relatively few examples (100 to 1,000) will demonstrate most of a word's 'interesting' behavior in context. The SEAlang Corpus will find all appearances of a word (with upper limits on hits and CPU time), then subsample at random to yield random contexts hits (default 1,000).

One intentional consequence of this is that each search will produce a different sample. Thus, the exact frequency figures (and in some cases, the order) of collocates may vary. In addition, the sample contexts shown for common words will certainly vary.

Note that the examples (default 5) are also drawn at random.