O'Reilly logo

Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Classification-based chunking

Unlike most part-of-speech taggers, the ClassifierBasedTagger class learns from features. That means we can create a ClassifierChunker class that can learn from both the words and part-of-speech tags, instead of only the part-of-speech tags as the TagChunker class does.

How to do it...

For the ClassifierChunker class, we don't want to discard the words from the training sentences as we did in the previous recipe. Instead, to remain compatible with the 2-tuple (word, pos) format required for training a ClassiferBasedTagger class, we convert the (word, pos, iob) 3-tuples from tree2conlltags() into ((word, pos), iob) 2-tuples using the chunk_trees2train_chunks() function. This code can be found in chunkers.py:

from nltk.chunk ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required