CHAPTER 5 IDENTIFYING AND UNDERSTANDING GROUPS

5.1 OVERVIEW

It is often useful to decompose a data set into simpler subsets to help make sense of the entire collection of observations. These groups may reflect the types of observations found in a data set. For example, the groups might summarize the different types of customers who visit a particular shop based on collected demographic information. Finding subgroups may help to uncover relationships in the data such as groups of consumers who buy certain combinations of products. The process of grouping a data set may also help identify rules from the data, which can in turn be used to support future decisions. For example, the process of grouping historical data can be used to understand which combinations of clinical treatments lead to the best patient outcomes. These rules can then be used to select an optimal treatment plan for new patients with the same symptoms. Finally, the process of grouping also helps discover observations dissimilar from those in the major identified groups. These outliers should be more closely examined as possible errors or anomalies.

The identification of interesting groups is not only a common deliverable for a data analysis project, but can also support other data mining tasks such as the development of a model to use in forecasting future events (as described in Chapter 6). This is because the process of grouping and interpreting the groups of observations helps the analyst to thoroughly understand ...

Get Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.