Corpus of Historical
American English (COHA) contain 400 million words of text from
1810-2009, and all of the n-grams from the corpus (millions of rows of data) can be
freely downloaded. They
contain all n-grams (including individual words) that occur at least three times total
in the corpus, and you can see the frequency of each of these n-grams in
each decade from the 1810s-2000s. This data can
be used offline to carry out powerful searches on a wide range of phenomena in
the history of American English.
For the 2-grams, 3-grams, and 4-grams, the number
listed below the column heading is the approximate number of unique n-grams (in
millions of words), followed by the total number of rows in the n-grams file
(realizing that a given n-gram usually appears several times in the file -- once
for each decade in which it appears in the corpus).
Click on [*] below to see small samples of
each n-grams (entries for the word light). Download
of the full n-grams sets is free, but we ask you to first
input your name and email address.