At the end of the previous chapter, Chapter 4, Part-of-speech Tagging, we introduced NLTK-Trainer and the
train_tagger.py script. In this recipe, we will cover the script for training chunkers:
train_tagger.py, the only required argument to
train_chunker.py is the name of a corpus. In this case, we need a corpus that provides a
chunked_sents() method, such as
treebank_chunk. Here's an example of running
$ python train_chunker.py treebank_chunk loading treebank_chunk 4009 chunks, training on 4009 ...