Chapter 13

Hierarchical clustering

13.1 Introduction

Hierarchical clustering extends the basic clustering task by requesting that the created clustering model is hierarchical. With nodes of a cluster hierarchy representing clusters, and their descendants representing their subclusters, such a model can be viewed as a combination of multiple clustering models, applicable to different domain regions. It is therefore not surprising that hierarchical clustering is much more computationally demanding than flat clustering. However, this increased computational complexity does not coincide with increased conceptual or algorithmic complexity, since the process of cluster hierarchy formation can be organized as a sequence of basic cluster merging or partitioning operations.

13.1.1 Basic approaches

This chapter reviews two approaches to creating hierarchical clustering models, both of which have very simple formulations:

  1. Agglomerative clustering. A bottom-up approach which starts with many small clusters and iteratively merges selected clusters until a single root cluster is reached.
  2. Divisive clustering. A top-down approach which starts with a single root cluster and iteratively partitions existing clusters into subclusters.

For both these approaches, we will assume that cluster merging or partitioning decisions are made based on an instance dissimilarity (or similarity) measure. While they explicitly perform cluster hierarchy formation only, we will see that clustering trees can ...

Get Data Mining Algorithms: Explained Using R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.