Chapter 1

An Introduction to Cluster Analysis

Charu C. Aggarwal

IBM T. J. Watson Research CenterYorktown Heights, NYcharu@us.ibm.com

1.1 Introduction

The problem of data clustering has been widely studied in the data mining and machine learning literature because of its numerous applications to summarization, learning, segmentation, and target marketing [46, 47, 52]. In the absence of specific labeled information, clustering can be considered a concise model of the data which can be interpreted in the sense of either a summary or a generative model. The basic problem of clustering may be stated as follows:

Given a set of data points, partition them into a set of groups which are as similar as possible.

Note that this is a very rough definition, ...

Get Data Clustering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.