Appendix A. Penn Treebank Part-of-speech Tags

The following is a table of all the part-of-speech tags that occur in the treebank corpus distributed with NLTK. The tags and counts shown here were acquired using the following code:

>>> from nltk.probability import FreqDist
>>> from nltk.corpus import treebank
>>> fd = FreqDist()
>>> for word, tag in treebank.tagged_words():
...	   fd[tag] += 1
>>> fd.items()

The FreqDist fd contains all the counts shown here for every tag in the treebank corpus. You can inspect each tag count individually, by doing fd[tag], for example, fd['DT']. Punctuation tags are also shown, along with special tags such as -NONE-, which signifies that the part-of-speech tag is unknown. Descriptions of most of the tags can be found ...

Get Python 3 Text Processing with NLTK 3 Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.