DECISION TREES

As the number of potential predictors increases, the method of linear regression becomes less and less practical. With three potential predictors, we can have as many as seven coefficients to be estimated: one for the intercept, three for first-order terms in the predictors Pi, two for second-order terms of the form PiPj, and one third-order term P1P2P3. With k variables, we have k first-order terms, k(k − 1) second-order terms, and so forth. Should all these terms be included in our model? Which ones should be neglected? With so many possible combinations, will a single equation be sufficient?

We need to consider alternate approaches. If you are a mycologist, a botanist, a herpetologist, or simply a nature lover you may have made use of some sort of a key. For example:

1. Leaves simple?
a. Leaves needle-shaped?
i. Leaves in clusters of two to many?
(a) Leaves in clusters of two to five, sheathed, persistent for several years?

Which is to say that one classifies objects according to whether or not they possess a particular characteristic. One could accomplish the same result by means of logistic regression, but the latter seems somewhat contrived.

The Classification And Regression Tree (CART) proposed by Breiman, Friedman, Olshen, and Stone [1984] is simply a method of automating the process of classification, so that the initial bifurcation, “Leaves simple” in the preceding example, provides the most effective division of the original sample, and so on.

We have ...

Get Common Errors in Statistics (and How to Avoid Them), 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.