N-grams data


Each of the following free n-grams file contains the (approximately) 1,000,000 most frequent n-grams from the one billion word Corpus of Contemporary American English (COCA). In order to download these files, you will first need to input your name and email. Thanks.
 
  sample   2-grams 3-grams 4-grams 5-grams
wordID + lexicon see   download download download download
 
words (only) see   download download download download
 
words+ + PoS see   download download download download
 

 

Case sensitive means that e.g. Bush and bush are separate entries. The n-grams with parts of speech allow you to find (for example) all of the tens of thousands of NOUN + NOUN sequences, or any other search that refers to the part of speech of the word. For help with the part of speech tags, click here.