Normalizing and lemmatizing

In the previous section, I wrote that all the words in the second example, she shan't be excessively learned, are already in the dictionary from the first sentence. The observant reader might note the word be isn't actually in the dictionary. From a linguistics point of view, that isn't necessarily false. The word be is the root word of is, of which was is the past tense. Here, there is a notion that instead of just adding the words directly, we should add the root word. This is called lemmatization. Continuing from the previous example, the following are the lemmatized words from the first sentence:

thechildbelearnanewwordandbeuseitexcessivelyshallnotshecry

Again, here I would like to point out some inconsistencies ...

Get Go Machine Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.