N-grams data


This page contains very short samples from COCA for the three different levels of n-grams . In all cases, the samples on this page are limited to just 1000 lines or so, so that they load quickly. Longer samples, more samples, and more information can be found in the "more information" links below.

LEVEL 1: 1,000,000 n-grams for each of 2, 3, 4, 5-grams. Samples here are for 3-grams only (n-grams with like). More information and samples...

LEVEL 2: All 2, 3, 4, 5-grams occurring at least three times. Samples here are just for words starting with [U], and are case sensitive, with part of speech.  More information and samples...

LEVEL 3: All 2, 3, 4-grams, even those that occur just once (hundreds of millions of  rows of data).
These n-grams sets are meant to be used in a database, where Sets 1 and 2 are joined together (via SQL commands) to produce something like Set 3. More information and samples...


COHA: From the Corpus of Historical American English. These samples are case sensitive and they also include part of speech. More information and samples...


Spanish and Portuguese. For the Corpus del Español and the Corpus do Português. These samples are not case sensitive, but they do include part of speech. More information and samples...