Training a named entity chunker
You can train your own named entity chunker using the
ieer
corpus, which stands for Information Extraction: Entity Recognition. It takes a bit of extra work, though, because the ieer
corpus has chunk trees but no part-of-speech tags for words.
How to do it...
Using the ieertree2conlltags()
and ieer_chunked_sents()
functions in chunkers.py
, we can create named entity chunk trees from the ieer
corpus to train the ClassifierChunker
class created in the Classification-based chunking recipe:
import nltk.tag from nltk.chunk.util import conlltags2tree from nltk.corpus import ieer def ieertree2conlltags(tree, tag=nltk.tag.pos_tag): words, ents = zip(*tree.pos()) iobs = [] prev = None for ent in ents: if ent == tree.label(): ...
Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.