SAMPLES: words (+ PoS)
You can purchase n-grams sets that contain all 1, 2, 3, 4,
and 5-grams that
occur at least four times in the one billion word Corpus of
Contemporary American English . The samples files that are
available on this page include the first 50,000 entries for words beginning with the
letter [m]. Explanation of columns in these sample
files.
When you purchase the data, you can either use the
"word" (this page of samples) or the "wordID
+ lexicon" format. For the "word" format, there are two different options
(again, both of which you have access to when you purchase the data).
You can also download all of these sample files as
one ZIP file.
|