Getting and preparing real-world medical data for exploring Decision Trees and Ensemble models in Spark 2.0

The dataset used depicts a real-life application of Decision Trees in machine learning. We used a cancer dataset to predict what makes a patient's case malignant or not. To explore the real power of decision trees, we use a medical dataset that exhibits real life non-linearity with a complex error surface.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.