Appendix A. Penn Treebank Part-of-speech Tags
The following is a table of all the part-of-speech tags that occur in the treebank
corpus distributed with NLTK. The tags and counts shown here were acquired using the following code:
>>> from nltk.probability import FreqDist >>> from nltk.corpus import treebank >>> fd = FreqDist() >>> for word, tag in treebank.tagged_words(): ... fd[tag] += 1 >>> fd.items()
The FreqDist fd
contains all the counts shown here for every tag in the treebank
corpus. You can inspect each tag count individually, by doing fd[tag]
, for example, fd['DT']
. Punctuation tags are also shown, along with special tags such as -NONE-
, which signifies that the part-of-speech tag is unknown. Descriptions of most of the tags can be found ...
Get Python 3 Text Processing with NLTK 3 Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.