|
N-GRAMS |
||||||||
| home | compare to Google | samples | using the data | historical (COHA) | non-English | free downloads | purchase | |

|
In addition to the COCA and COHA-based n-grams of English, we also have n-grams for Spanish (based on the Corpus del Español) and Portuguese (based on the Corpus do Português). Although the Spanish and Portuguese n-grams are based on much smaller corpora than COCA and COHA, they are still the only n-grams that we are aware of that are based on large, genre-balanced corpora. The following are small samples of the n-grams data, each of which include the 50,000 most frequent n-grams (along with part of speech):
The following are the approximate number of n-grams for each language:
The n-grams data for Spanish and Portuguese is available in two different formats:
Pricing for either format shown above, and for either Spanish or Portuguese is:
To order, please email us at corpus@byu.edu. We will send you a short one-page NDA (non-disclosure agreement) for the desired product, and will then send a request for payment from PayPal. For a graduate student license, you must provide us with a university web page that lists you as a graduate student. For other academic licenses, the NDA you send back must come from a university email account. Thanks.
|