N-grams data

Corpus of Contemporary American English


 Purchase data 

Overview
Compare to Google
Processing the data

Samples (COCA)
   Level 1 (free)
   Level 2
   Level 3

Historical (COHA)
Free (1 million)

Spanish data
Portuguese data

Related sites
  Full-text data 
  Word frequency
  Collocates
  WordAndPhrase
  Academic vocabulary
  corpus.byu.edu

Contact us


SAMPLES: LEVEL 1

You can freely download files containing the 1,000,000 most frequent 2, 3, 4, and 5-grams. For each of these n-grams, you can choose between the following three formats. On this sample page, we provide samples from just the 3-grams, but you can download all 12 files (2, 3, 4, 5-grams; three formats) after quickly registering.

Note 1. Click here for a listing of the part of speech tags (see also the notes at the bottom of that page, regarding tags like ii31).

Note 2. To save space there is only one part of speech listed for each word, even if the tagger originally suggested two or three options. The PoS listed is the one that was ranked most likely by the tagger.

Note 3. The columns refer to:  frequency   word1   word2   word3 (pos1   pos2   pos3)   . (More information)