Further Reading

Extra materials for this chapter are posted at http://www.nltk.org/, including links to freely available resources on the Web. For more examples of tagging with NLTK, please see the Tagging HOWTO at http://www.nltk.org/howto. Chapters 4 and 5 of (Jurafsky & Martin, 2008) contain more advanced material on n-grams and part-of-speech tagging. Other approaches to tagging involve machine learning methods (Chapter 6). In Chapter 7, we will see a generalization of tagging called chunking in which a contiguous sequence of words is assigned a single tag.

For tagset documentation, see nltk.help.upenn_tagset() and nltk.help.brown_tagset(). Lexical categories are introduced in linguistics textbooks, including those listed in Chapter 1 of this book.

There are many other kinds of tagging. Words can be tagged with directives to a speech synthesizer, indicating which words should be emphasized. Words can be tagged with sense numbers, indicating which sense of the word was used. Words can also be tagged with morphological features. Examples of each of these kinds of tags are shown in the following list. For space reasons, we only show the tag for a single word. Note also that the first two examples use XML-style tags, where elements in angle brackets enclose the word that is tagged.

Speech Synthesis Markup Language (W3C SSML)

That is a <emphasis>big</emphasis> car!

SemCor: Brown Corpus tagged with WordNet senses

Space in any <wf pos="NN" lemma="form" wnsn="4">form</wf> is completely measured ...

Get Natural Language Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.