Stemming, lemmatization, and stopwords

Stemming and lemmatization are two different but very similar techniques that attempt to reduce every word to its base form, which simplifies the language model. For instance, if we were to stem the various forms of a cat, we'd make the transformation in this example:

cat, cats, cat's, cats' -> cat

The difference between lemmatization and stemming then becomes how we make this transformation. Stemming is done algorithmically. When applied to multiple forms of the same word, the extracted root should be the same most of the time. This concept can be contrasted with lemmatization, which uses a vocabulary with known bases and consideration for how the word was used.

Stemming is typically much faster than ...

Get Deep Learning Quick Reference now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.