O'Reilly logo
  • Morgen Kimbrell thinks this is interesting:

American National Corpus (ANC)200322 million wordsSpoken and written textsCorpus of Contemporary American English (COCA)2008425 million wordsSpoken, fiction, popular magazine, and academic texts

From

Cover of Natural Language Annotation for Machine Learning

Note

Exiting corpra for use as a data input