Lemmatizing words with WordNet
Lemmatization is very similar to stemming, but is more akin to synonym replacement. A lemma is a root word, as opposed to the root stem. So unlike stemming, you are always left with a valid word that means the same thing. However, the word you end up with can be completely different. A few examples will explain this.
Getting ready
Make sure that you have unzipped the wordnet
corpus in nltk_data/corpora/wordnet
. This will allow the WordNetLemmatizer
class to access WordNet. You should also be familiar with the part-of-speech tags covered in the Looking up Synsets for a word in WordNet recipe of Chapter 1, Tokenizing Text and WordNet Basics.
How to do it...
We will use the WordNetLemmatizer
class to find lemmas:
>>> from ...
Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.