○ The IOB format categorizes tagged tokens as
B. Why are three tags
necessary? What problem would be caused if we used
○ Write a tag pattern to match noun phrases containing plural
head nouns, e.g.,
positions/NNS. Try to do this by generalizing the tag
pattern that handled singular noun phrases.
○ Pick one of the three chunk types in the CoNLL-2000 Chunking
Corpus. Inspect the data and try to observe any patterns in the POS
tag sequences that make up this kind of chunk. Develop a simple
chunker using the regular expression chunker
nltk.RegexpParser. Discuss any tag
sequences that are difficult to chunk reliably.
○ An early definition of chunk was the material that occurs between chinks. Develop a chunker that starts by putting the whole sentence in a single chunk, and then does the rest of its work solely by chinking. Determine which tags (or tag sequences) are most likely to make up chinks with the help of your own utility program. Compare the performance and simplicity of this approach relative to a chunker based entirely on chunk rules.
Write a tag pattern to cover noun phrases that contain
editor/NN. Add these patterns to the grammar, one per line. Test your work using some tagged sentences of your own devising. ...