O'Reilly logo

Python Text Processing with NLTK 2.0 Cookbook by Jacob Perkins

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Classification-based chunking

U nlike most part-of-speech taggers, the ClassifierBasedTagger learns from features. That means we can create a ClassifierChunker that can learn from both the words and part-of-speech tags, instead of only the part-of-speech tags as the TagChunker does.

How to do it...

For the ClassifierChunker, we don't want to discard the words from the training sentences, as we did in the previous recipe. Instead, to remain compatible with the 2-tuple (word, pos) format required for training a ClassiferBasedTagger, we convert the (word, pos, iob) 3-tuples from nltk.chunk.tree2conlltags() into ((word, pos), iob) 2-tuples using the chunk_trees2train_chunks() function. This code can be found in chunkers.py:

import nltk.chunk from nltk.tag ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required