Chapter 3Decision trees

3.1 Introduction

In many applications, we not only want to just use the created classification model to accurately classify instances, but we may also want to inspect the model. This makes it possible to explain its predictions, modify it, or combine with some existing background knowledge. In such applications, where both high classification accuracy and human readability of the model are required, the obvious method of choice for most data miners will be decision trees.

Decision tree algorithms have been studied for many years and belong to those data mining algorithms for which particularly numerous refinements and variations have been proposed. One can therefore speak about a family of algorithms that share the same model representation and algorithm operation schemes, but may differ in several details. The space for this diversity is increased by the two-phase process usually performed to create decision tree models, consisting of decision tree growing and pruning. It is hardly possible to describe all these algorithm variations with the level of detail adopted by this book without some substantial omissions and compromises. Only the most common ones will be discussed and not all of them will be illustrated with R examples.

Get Data Mining Algorithms: Explained Using R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.