Developing a chunker using pos-tagged corpora

Chunking is the process used to perform entity detection. It is used for the segmentation and labeling of multiple sequences of tokens in a sentence.

To design a chunker, a chunk grammar should be defined. A chunk grammar holds the rules of how chunking should be done.

Let's consider the example that performs Noun Phrase Chunking by forming the chunk rules:

>>> import nltk >>> sent=[("A","DT"),("wise", "JJ"), ("small", "JJ"),("girl", "NN"), ("of", "IN"), ("village", "N"), ("became", "VBD"), ("leader", "NN")] >>> sent=[("A","DT"),("wise", "JJ"), ("small", "JJ"),("girl", "NN"), ("of", "IN"), ("village", "NN"), ("became", "VBD"), ("leader", "NN")] >>> grammar = "NP: {<DT>?<JJ>*<NN><IN>?<NN>*}" >>> find = ...

Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Natural Language Processing: Python and NLTK by Nitin Hardeniya, Jacob Perkins, Deepti Chopra, Nisheeth Joshi, Iti Mathur

Developing a chunker using pos-tagged corpora

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly