Chapter 18

Discretization

18.1 Introduction

Continuous attribute discretization—which basically consists in creating discrete attributes to replace originally continuous attributes—belongs to the most frequently used forms of attribute transformation. Unlike most of others, it is sufficiently complex to give space for a variety of algorithms of the varying level of refinement and computational expense. Some of them are actually much closer to classification algorithms than to simple arithmetic transformations. This, along with the possible impact of discretization on the process and results of subsequent modeling, justifies presenting them separately in this chapter.

Discretized attribute values correspond to intervals of the original continuous attribute's values, to which its range is divided. Determining the number and boundaries of these intervals in order to preserve the original attribute's predictive utility is the major challenge addressed by discretization algorithms. The most successful of them are those that take into account the purpose the discretized attribute is supposed to be used for, which is usually creating a classification model. Such discretization algorithms receive most attention in this chapter.

Get Data Mining Algorithms: Explained Using R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.