Training a unigram part-of-speech tagger

A unigram generally refers to a single token. Therefore, a unigram tagger only uses a single word as its context for determining the part-of-speech tag.

UnigramTagger inherits from NgramTagger, which is a subclass of ContextTagger, which inherits from SequentialBackoffTagger. In other words, UnigramTagger is a context-based tagger whose context is a single word, or unigram.

How to do it...

UnigramTagger can be trained by giving it a list of tagged sentences at initialization.

>>> from nltk.tag import UnigramTagger >>> from nltk.corpus import treebank >>> train_sents = treebank.tagged_sents()[:3000] >>> tagger = UnigramTagger(train_sents) >>> treebank.sents()[0] ['Pierre', 'Vinken', ',', '61', 'years', 'old', ...

Get Python 3 Text Processing with NLTK 3 Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins

Training a unigram part-of-speech tagger

How to do it...

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly