N-grams data

Corpus of Contemporary American English


Overview
Compare to Google
Samples (COCA)
   Level 1 (free)
   Level 2
   Level 3
Historical (COHA)
Processing the data

Spanish data
Portuguese data

Purchase data
Free (1 million)

Related sites
  Word frequency
  WordAndPhrase
  Collocates
  corpus.byu.edu

Contact us


The n-grams are available in a number of different formats:

Level

Data

Size

Samples

Price

1

Most frequent 2, 3, and 4-grams

1 million entries each

See

Free

2

All 2, 3, 4-grams that occur at least 3 times. Available ±case sensitive, ±part of speech (more info)

6.2 million, 11.9 million, and 8.3 million n-grams, respectively

See

$55

$95

$195

3

All 2, 3, and 4-grams, including those that occur just 1-2 times

More than 155 million rows (for the 3-grams). The format allows users to specify word, PoS, and lemma.

See

$95

$195

$395

License: GRAD = graduate student, ACAD = other academic, COM = commercial 

GRAD

ACAD

COM

To purchase the files (Levels 2 and 3):

1. Download and fill out the appropriate non-disclosure agreement (NDA) by clicking on one of the links in the blue sections above, and then send it back to us as an email attachment. For both GRAD and ACAD licenses, the NDA must be sent back from a university email account. For GRAD, you must also provide proof of status via a university web page (on the NDA).

2. Once we receive the NDA, we'll send you a request for payment from PayPal.

3. As soon as we receive confirmation of the payment, we'll send you the link to download the data.

Thanks for your interest.