Unigram taggers are based on a simple statistical algorithm: for each
token, assign the tag that is most likely for that particular token.
For example, it will assign the tag
JJ to any occurrence of the word
frequent, since frequent is
used as an adjective (e.g., a frequent word) more
often than it is used as a verb (e.g., I frequent this
cafe). A unigram tagger behaves just like a lookup tagger
(Automatic Tagging), except there is a more
convenient technique for setting it up, called training. In the following code sample, we
train a unigram tagger, use it to tag a sentence, and then
>>> from nltk.corpus import brown >>> brown_tagged_sents = brown.tagged_sents(categories='news') >>> brown_sents = brown.sents(categories='news') >>> unigram_tagger = nltk.UnigramTagger(brown_tagged_sents) >>> unigram_tagger.tag(brown_sents) [('Various', 'JJ'), ('of', 'IN'), ('the', 'AT'), ('apartments', 'NNS'), ('are', 'BER'), ('of', 'IN'), ('the', 'AT'), ('terrace', 'NN'), ('type', 'NN'), (',', ','), ('being', 'BEG'), ('on', 'IN'), ('the', 'AT'), ('ground', 'NN'), ('floor', 'NN'), ('so', 'QL'), ('that', 'CS'), ('entrance', 'NN'), ('is', 'BEZ'), ('direct', 'JJ'), ('.', '.')] >>> unigram_tagger.evaluate(brown_tagged_sents) 0.9349006503968017
We train a
UnigramTagger by specifying tagged sentence data as a parameter when we initialize the tagger. The training process involves inspecting the tag of each word and storing the most likely tag for any word in a ...