N-GRAMS
from the COCA and COHA corpora of American English

home compare to Google samples using the data historical (COHA) non-English free downloads purchase


The n-grams are available in a number of different formats:

Level

Data

Size

Samples

Price

1

Most frequent 2, 3, and 4-grams

1 million entries each

See

Free

2

All 2, 3, 4-grams that occur at least 3 times. Available ±case sensitive, ±part of speech (more info)

6.2 million, 11.9 million, and 8.3 million n-grams, respectively

See

$55

$95

$195

3

All 2, 3, and 4-grams, including those that occur just 1-2 times

More than 155 million rows (for the 3-grams). The format allows users to specify word, PoS, and lemma.

See

$95

$195

$395

License: GRAD = graduate student, ACAD = other academic, COM = commercial 

GRAD

ACAD

COM

To purchase the files (Levels 2 and 3):

1. Download and fill out the appropriate non-disclosure agreement (NDA) by clicking on one of the links in the blue sections above, and then send it back to us as an email attachment. For both GRAD and ACAD licenses, the NDA must be sent back from a university email account. For GRAD, you must also provide proof of status via a university web page (on the NDA).

2. Once we receive the NDA, we'll send you a request for payment from PayPal.

3. As soon as we receive confirmation of the payment, we'll send you the link to download the data.

Thanks for your interest.