Decision trees

A decision tree is a non-parametric supervised learning algorithm which can be used for both classification and regression. Decision trees are like inverted trees with the root node at the top and leaf nodes forming downwards. There are different algorithms to split the dataset into branch-like segments. Each leaf node is assigned to a class that represents the most appropriate target values.

Decision trees do not require any scaling or transformations of the dataset and work as the data is. They can handle both categorical and continuous features, and also address non-linearity in the dataset. At its core, a decision tree is a greedy algorithm (it considers the best split at the moment and does not take into consideration the future ...

Get Spark for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.