Chapter 8

Into the Forest, Randomly

IN THIS CHAPTER

check Looking at random forests

check Growing a random forest for irises

check Developing a random forest for glass identification

In Chapter 7, I help you explore decision trees. Suppose a decision tree is an expert decision-maker: Give a tree a set of data, and it makes decisions about the data. Taking this idea a step further, suppose you have a panel of experts — a group of decision trees — and each one makes a decision about the same data. One could poll the panel to come up with the best decision.

This is the idea behind the random forest — a collection of decision trees that you can poll, and the majority vote is the decision.

Growing a Random Forest

So how does all this happen? How do you create a forest out of a dataset? Well, randomly.

Here's what I mean. In Chapter 7, I discuss the creation of a decision tree from a dataset. I use the rattle package to partition a data frame into a training set, a validation set, and a test set. The partitioning takes place as a result of random sampling from the rows in the data frame. The default condition is that rattle randomly assigns 70 percent of the rows to the training set, 15 percent to the ...

Get R Projects For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.