Feature preparation

In the previous section, we selected our models and also prepared our dependent variable for our supervised machine learning. In this section, we need to move forward to prepare our independent variables, which are all the features representing the factors impacting our dependent variable: the sales team success. Specifically, for this important work, we need to reduce our four hundred of features to a reasonable group for final modeling. For this, we will employ PCA, utilize some subject knowledge, and then perform some feature selection tasks.

PCA

PCA is a very mature and also commonly used feature reduction method that is often used to find a small set of variables that counts for most of the variance. Technically, the goal ...

Get Apache Spark Machine Learning Blueprints now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.