Developing a chunker using pos-tagged corpora

Chunking is the process used to perform entity detection. It is used for the segmentation and labeling of multiple sequences of tokens in a sentence.

To design a chunker, a chunk grammar should be defined. A chunk grammar holds the rules of how chunking should be done.

Let's consider the example that performs Noun Phrase Chunking by forming the chunk rules:

>>> import nltk >>> sent=[("A","DT"),("wise", "JJ"), ("small", "JJ"),("girl", "NN"), ("of", "IN"), ("village", "N"), ("became", "VBD"), ("leader", "NN")] >>> sent=[("A","DT"),("wise", "JJ"), ("small", "JJ"),("girl", "NN"), ("of", "IN"), ("village", "NN"), ("became", "VBD"), ("leader", "NN")] >>> grammar = "NP: {<DT>?<JJ>*<NN><IN>?<NN>*}" >>> find = ...

Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.