Classification and Regression Trees

We have seen how linear regression models allow us to predict a numerical outcome, and how logistic regression models allow us to predict a categorical outcome. However, both of these models assume a linear relationship between variables. Classification and Regression Trees (CART) overcome this problem by generating Decision Trees, which are also much easier to interpret compared to the supervised learning models we have seen so far. These decision trees can then be traversed to come to a final decision, where the outcome can either be numerical (regression trees) or categorical (classification trees). A simple classification tree used by a mortgage lender is illustrated in Figure 4.7:

Figure 4.7: Simple ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.