O'Reilly logo

Python Text Processing with NLTK 2.0 Cookbook by Jacob Perkins

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Creating a categorized chunk corpus reader

NLTK provides a CategorizedPlaintextCorpusReader and CategorizedTaggedCorpusReader, but there's no categorized corpus reader for chunked corpora. So in this recipe, we're going to make one.

Getting ready

Refer to the earlier recipe, Creating a chunked phrase corpus, for an explanation of ChunkedCorpusReader, and to the previous recipe for details on CategorizedPlaintextCorpusReader and CategorizedTaggedCorpusReader, both of which inherit from CategorizedCorpusReader.

How to do it...

We'll create a class called CategorizedChunkedCorpusReader that inherits from both CategorizedCorpusReader and ChunkedCorpusReader. It is heavily based on the CategorizedTaggedCorpusReader, and also provides three additional methods ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required