Building a classification system with Decision Trees in Spark 2.0

In this recipe, we will use the breast cancer data and use classifications to demonstrate the Decision Tree implantation in Spark. We will use the IG and Gini to show how to use the facilities already provided by Spark to avoid redundant coding. This recipe attempts to fit a single tree using a binary classification to train and predict the label (benign (0.0) and malignant (1.0)) for the dataset.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.