Corpus: genres |
Spoken, fiction, magazine, newspaper, academic. |
Same five genres
as before (with about 120-130 million words per genre), plus
the three new genres:
-- Blog posts and other web pages
(120-130 million words for each of these two genres). So
much of what we consume nowadays comes from the web, and
these genres include many words that don't occur much
elsewhere (e.g. ebook, webpage, browsing, password,
template, meme, snarky, off-topic, downloadable,
open-source, updated, (to) monetize, upgrade, debunk,
archive, pirate, upgrade).
-- TV and movies subtitles (130 million
words). This is by far the most informal language we've ever
had in COCA. Many studies (e.g.
A,
B, and
C show that the data from subtitles
agrees with native speaker intuitions about their language even
better than the data from actual everyday conversation (like in
the BNC). Until now, COCA didn't really have this highly
informal language. |