O'Reilly logo

Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Training a chunker with NLTK-Trainer

At the end of the previous chapter, Chapter 4, Part-of-speech Tagging, we introduced NLTK-Trainer and the train_tagger.py script. In this recipe, we will cover the script for training chunkers: train_chunker.py.

Note

You can find NLTK-Trainer at https://github.com/japerk/nltk-trainer and the online documentation at http://nltk-trainer.readthedocs.org/.

How to do it...

As with train_tagger.py, the only required argument to train_chunker.py is the name of a corpus. In this case, we need a corpus that provides a chunked_sents() method, such as treebank_chunk. Here's an example of running train_chunker.py on treebank_chunk:

$ python train_chunker.py treebank_chunk
loading treebank_chunk
4009 chunks, training on 4009 ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required