Training a named entity chunker

You can train your own named entity chunker using the ieer corpus, which stands for Information Extraction: Entity Recognition. It takes a bit of extra work, though, because the ieer corpus has chunk trees but no part-of-speech tags for words.

How to do it...

Using the ieertree2conlltags() and ieer_chunked_sents() functions in chunkers.py, we can create named entity chunk trees from the ieer corpus to train the ClassifierChunker class created in the Classification-based chunking recipe:

import nltk.tag from nltk.chunk.util import conlltags2tree from nltk.corpus import ieer def ieertree2conlltags(tree, tag=nltk.tag.pos_tag): words, ents = zip(*tree.pos()) iobs = [] prev = None for ent in ents: if ent == tree.label(): ...

Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.