Cover by Edward Loper, Steven Bird, Ewan Klein

Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo

Exercises

  1. ○ The IOB format categorizes tagged tokens as I, O, and B. Why are three tags necessary? What problem would be caused if we used I and O tags exclusively?

  2. ○ Write a tag pattern to match noun phrases containing plural head nouns, e.g., many/JJ researchers/NNS, two/CD weeks/NNS, both/DT new/JJ positions/NNS. Try to do this by generalizing the tag pattern that handled singular noun phrases.

  3. ○ Pick one of the three chunk types in the CoNLL-2000 Chunking Corpus. Inspect the data and try to observe any patterns in the POS tag sequences that make up this kind of chunk. Develop a simple chunker using the regular expression chunker nltk.RegexpParser. Discuss any tag sequences that are difficult to chunk reliably.

  4. ○ An early definition of chunk was the material that occurs between chinks. Develop a chunker that starts by putting the whole sentence in a single chunk, and then does the rest of its work solely by chinking. Determine which tags (or tag sequences) are most likely to make up chinks with the help of your own utility program. Compare the performance and simplicity of this approach relative to a chunker based entirely on chunk rules.

  5. Write a tag pattern to cover noun phrases that contain gerunds, e.g., the/DT receiving/VBG end/NN, assistant/NN managing/VBG editor/NN. Add these patterns to the grammar, one per line. Test your work using some tagged sentences of your own devising. ...

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required