Feature extractors

We have seen how feature transformers allow us to convert, modify, and standardize our documents using a preprocessing pipeline, resulting in the conversion of raw text into a collection of tokens. Feature extractors take these tokens and generate feature vectors from them that may then be used to train machine learning models. Two common examples of typical feature extractors that are used in NLP are the bag of words and term frequency–inverse document frequency (TF–IDF) algorithms.

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.