Fundamentals of NLP

NLP, at its core, works by splitting a chunk of text (also referred to as a corpus) into individual segments or tokens and then analyzing them. These tokens might simply be individual words but might also be word contractions. Let's look at how a computer might interpret the phrase: I have watered the plants.

If we were to split this corpus into tokens, it would probably look something like this:

['I', 'have', 'watered', 'the', 'plants']

The word the in our corpus is unnecessary as it does not help to understand the phrase's intent— the same for the word have. We should therefore remove the surplus words:

['I', 'watered', 'plants']

Already, this is starting to look more usable. We have a personal pronoun in the form of an actor ...

Get Building Slack Bots now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.