N-grams data

Corpus of Contemporary American English


 Purchase data 

Overview
Compare to Google
Processing the data

Samples (COCA)
   Level 1 (free)
   Level 2
   Level 3

Historical (COHA)
Free (1 million)

Spanish data
Portuguese data

Related sites
  Full-text data 
  Word frequency
  Collocates
  WordAndPhrase
  Academic vocabulary
  corpus.byu.edu

Contact us


You can download free n-grams that contain the top 1,000,000 n-grams for each of the following: 2-grams (two word sequences), 3-grams, 4-grams, and 5-grams. All n-grams also include part of speech information, so you can quickly and easily find, for example, all NOUN + NOUN sequences or all two word strings where the first word ends with a certain letter and the second word starts with a different one.

To download these files, just fill in the following form. By the way, you might want to use an email address that you'll be using for the next year or two. We have a number of other (free) corpus-based frequency lists that we plan on releasing during this time, and we'll let you know about them by means of the email address that you enter below.

Your name
Email address
I agree not to distribute this list to others, and to not develop any other frequency lists that are based on this data, which will be sold commercially.