Feature extraction

In this section, we will turn our focus to feature extraction, which is to develop new features or variables from the available features or information of working datasets. At the same time, we will discuss some of Apache Spark's special capabilities for feature extraction as well as some related feature solutions made easy with Spark.

After this section, we will be able to develop and organize features for various machine learning projects.

Feature development challenges

For most big data machine learning projects, with many big datasets, we often cannot use them immediately. For example, when we take in some web log data, it is very messy and often in a form such as a collection of random text, from which we need to extract useful ...

Get Apache Spark Machine Learning Blueprints now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.