N-grams data

Corpus of Contemporary American English


 Purchase data 

Overview
Compare to Google
Processing the data

Samples (COCA)
   Level 1 (free)
   Level 2
   Level 3

Historical (COHA)
Free (1 million)

Spanish data
Portuguese data

Related sites
  Full-text data 
  Word frequency
  Collocates
  WordAndPhrase
  Academic vocabulary
  corpus.byu.edu

Contact us


Each of the following free n-grams file contains the (approximately) 1,000,000 most frequent n-grams from the Corpus of Contemporary American English (COCA). In order to download these files, you will first need to input your name and email. Thanks.
 
  sample   2-grams 3-grams 4-grams 5-grams
non case sensitive see   download download download download
case sensitive see   download download download download
case sensitive, with part of speech see   download download download download

 

Case sensitive means that e.g. Bush and bush are separate entries. The n-grams with parts of speech allow you to find (for example) all of the tens of thousands of NOUN + NOUN sequences, or any other search that refers to the part of speech of the word. For help with the part of speech tags, click here.