Extracting proper noun chunks
A simple way to do named entity extraction is to chunk all proper nouns (tagged with NNP
). We can tag these chunks as NAME
, since the definition of a proper noun is the name of a person, place, or thing.
How to do it...
Using the RegexpParser
class, we can create a very simple grammar that combines all proper nouns into a NAME
chunk. Then, we can test this on the first tagged sentence of treebank_chunk
to compare the results with the previous recipe:
>>> chunker = RegexpParser(r''' ... NAME: ... {<NNP>+} ... ''') >>> sub_leaves(chunker.parse(treebank_chunk.tagged_sents()[0]), 'NAME') [[('Pierre', 'NNP'), ('Vinken', 'NNP')], [('Nov.', 'NNP')]]
Although we get Nov.
as a NAME
chunk, this isn't a wrong result, as Nov.
is ...
Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.