Chapter 3. Creating Custom Corpora

In this chapter, we will cover the following recipes:

  • Setting up a custom corpus
  • Creating a wordlist corpus
  • Creating a part-of-speech tagged word corpus
  • Creating a chunked phrase corpus
  • Creating a categorized text corpus
  • Creating a categorized chunk corpus reader
  • Lazy corpus loading
  • Creating a custom corpus view
  • Creating a MongoDB-backed corpus reader
  • Corpus editing with file locking

Introduction

In this chapter, we'll cover how to use corpus readers and create custom corpora. If you want to train your own model, such as a part-of-speech tagger or text classifier, you will need to create a custom corpus to train on. Model training is covered in the subsequent chapters.

Now you'll learn how to use the existing corpus data ...

Get Python 3 Text Processing with NLTK 3 Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.