O'Reilly logo

Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Training a tagger-based chunker

Training a chunker can be a great alternative to manually specifying regular expression chunk patterns. Instead of a pain-staking process of trial and error to get the exact right patterns, we can use existing corpus data to train chunkers much like we did for part-of-speech tagging in the previous chapter.

How to do it...

As with the part-of-speech tagging, we'll use the treebank corpus data for training. But this time, we'll use the treebank_chunk corpus, which is specifically formatted to produce chunked sentences in the form of trees. These chunked_sents() methods will be used by a TagChunker class to train a tagger-based chunker. The TagChunker class uses a helper function, conll_tag_chunks(), to extract a list ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required