N-grams data


You can freely download files containing the 1,000,000 most frequent 2, 3, 4, and 5-grams. For each of these n-grams, you can choose between the following three formats. On this sample page, we provide samples from just the 3-grams, but you can download all 12 files (2, 3, 4, 5-grams; three formats) after quickly registering.

Note 1. Click here for a listing of the part of speech tags (see also the notes at the bottom of that page, regarding tags like ii31).

Note 2. To save space there is only one part of speech listed for each word, even if the tagger originally suggested two or three options. The PoS listed is the one that was ranked most likely by the tagger.

Note 3. The columns refer to:  frequency   word1   word2   word3 (pos1   pos2   pos3)   . (More information)