N-grams data


You can download free n-grams that contain the top 1,000,000 n-grams for each of the following: 2-grams (two word sequences), 3-grams, 4-grams, and 5-grams. All n-grams also include part of speech information, so you can quickly and easily find, for example, all NOUN + NOUN sequences or all two word strings where the first word ends with a certain letter and the second word starts with a different one.

To download these files, just fill in the following form. By the way, you might want to use an email address that you'll be using for the next year or two. We have a number of other (free) corpus-based frequency lists that we plan on releasing during this time, and we'll let you know about them by means of the email address that you enter below.

Your name
Email address
I agree not to distribute this list to others, and to not develop any other frequency lists that are based on this data, which will be sold commercially.