N-grams data

from the 14 billion word iWeb corpus

intro samples related get data

Note: this data is based on corpora that were created solely by Mark Davies, Professor of Linguistics at Brigham Young University. As the result of an agreement between BYU and Mark Davies, all transactions regarding payments and licenses for this data are made solely with Mark Davies, rather than with BYU.


 
N-grams data from iWeb
(See sample)
Academic *   $195 License agreement
Commercial   $395 License agreement
Notes:
* Both wordID (for database) and "word" formats are included
* WordID = 100 million n-grams for each of 2-grams, 3-grams, 4-grams, 5-grams;
* Words = 50 million n-grams for each of 2-grams, 3-grams, 4-grams, 5-grams


These are the steps to obtain the data:

1. Download and fill out the license agreement. This states that you will not give the data to anyone else outside of your university or company (which also means that you cannot post it on the web). You just need to fill in your name and company (if that is applicable), and then send it back to us  as an attachment. * Note that you must use an academic email address (e.g. *.edu or *.ac.edu) for an academic license.
2. Once we receive the license agreement, we'll send you a request for payment from PayPal.

3. You make the payment with a credit card at PayPal. Note that you do not need a PayPal account to make the payment.

4. As soon as we receive confirmation of the payment, we'll send you the link to download the data.