Feature extraction

As mentioned earlier in this chapter, the NLP system does not understand string values. They need numerical input to build models, sometimes they are also called numerical features. Feature extraction in NLP is converting a set of text information into a set of numerical features. Any machine learning algorithm that you are going to train would need features in numerical vector forms as it does not understand the string. There are many ways text can be represented as numerical vectors. Some such ways are One hot encoding, TF-IDF, Word2Vec, and CountVectorizer.

Get Artificial Intelligence for Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.