Agglomerative clustering

In contrast to algorithms, such as k-means, where the dataset is partitioned into individual groups, agglomerative or hierarchical clustering techniques start by considering each datapoint as its own cluster and merging them together into larger groups from the bottom up (Maimon, Oded, and Lior Rokach, eds. Data mining and knowledge discovery handbook. Vol. 2. New York: Springer, 2005). The classical application of this idea is in phylogenetic trees in evolution, where common ancestors connect individual organisms. Indeed, these methods organize the data into tree diagrams, known as dendrograms, which visualize how the data is sequentially merged into larger groups.

The basic steps of an agglomerative algorithm are (diagrammed ...

Get Mastering Predictive Analytics with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.