Grouping categorical values

In the data used for modeling, we frequently find attributes with a large number of different categorical values. A typical example is product codes, identifying a product purchased by a customer.

A data attribute with many different values can cause problems for data mining algorithms; complex data can make the algorithms run slowly, and may make it more difficult to find the patterns in the data, leading to less accurate models. A useful step in data preparation is to simplify this kind of complex data by grouping the values of a categorical variable into a smaller range of values, where the grouping has a relationship to the problem to be solved.

This recipe shows how to group product codes by their relation to a target ...

Get IBM SPSS Modeler Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.