Stemming and lemmatization are two different but very similar techniques that attempt to reduce every word to its base form, which simplifies the language model. For instance, if we were to stem the various forms of a cat, we'd make the transformation in this example:
cat, cats, cat's, cats' -> cat
The difference between lemmatization and stemming then becomes how we make this transformation. Stemming is done algorithmically. When applied to multiple forms of the same word, the extracted root should be the same most of the time. This concept can be contrasted with lemmatization, which uses a vocabulary with known bases and consideration for how the word was used.