Converting words to their base forms using lemmatization

Lemmatization is another way of reducing words to their base forms. In the previous section, we saw that the base forms that were obtained from those stemmers didn't make sense. For example, all the three stemmers said that the base form of calves is calv, which is not a real word. Lemmatization takes a more structured approach to solve this problem.

The lemmatization process uses a vocabulary and morphological analysis of words. It obtains the base forms by removing the inflectional word endings such as ing or ed. This base form of any word is known as the lemma. If you lemmatize the word calves, you should get calf as the output. One thing to note is that the output depends on whether ...

Get Artificial Intelligence with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.