N-grams data

Corpus of Contemporary American English


 Purchase data 

Overview
Compare to Google
Processing the data

Samples (COCA)
   Level 1 (free)
   Level 2
   Level 3

Historical (COHA)
Free (1 million)

Spanish data
Portuguese data

Related sites
  Full-text data 
  Word frequency
  Collocates
  WordAndPhrase
  Academic vocabulary
  corpus.byu.edu

Contact us


This page contains very short samples from COCA for the three different levels of n-grams . In all cases, the samples on this page are limited to just 1000 lines or so, so that they load quickly. Longer samples, more samples, and more information can be found in the "more information" links below.

LEVEL 1: 1,000,000 n-grams for each of 2, 3, 4, 5-grams. Samples here are for 3-grams only (n-grams with like). More information and samples...

LEVEL 2: All 2, 3, 4, 5-grams occurring at least three times. Samples here are just for words starting with [U], and are case sensitive, with part of speech.  More information and samples...

LEVEL 3: All 2, 3, 4-grams, even those that occur just once (hundreds of millions of  rows of data).
These n-grams sets are meant to be used in a database, where Sets 1 and 2 are joined together (via SQL commands) to produce something like Set 3. More information and samples...


COHA: From the Corpus of Historical American English. These samples are case sensitive and they also include part of speech. More information and samples...


Spanish and Portuguese. For the Corpus del EspaŮol and the Corpus do PortuguÍs. These samples are not case sensitive, but they do include part of speech. More information and samples...