Cover by Toby Segaran

Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo

Hierarchical Clustering

Hierarchical clustering builds up a hierarchy of groups by continuously merging the two most similar groups. Each of these groups starts as a single item, in this case an individual blog. In each iteration this method calculates the distances between every pair of groups, and the closest ones are merged together to form a new group. This is repeated until there is only one group. Figure 3-1 shows this process.

Hierarchical clustering in action

Figure 3-1. Hierarchical clustering in action

In the figure, the similarity of the items is represented by their relative locations—the closer two items are, the more similar they are. At first, the groups are just individual items. In the second step, you can see that A and B, the two items closest together, have merged to form a new group whose location is halfway between the two. In the third step, this new group is merged with C. Since D and E are now the two closest items, they form a new group. The final step unifies the two remaining groups.

After hierarchical clustering is completed, you usually view the results in a type of graph called a dendrogram, which displays the nodes arranged into their hierarchy. The dendrogram for the example above is shown in Figure 3-2.

A dendrogram is a visualization of hierarchical clustering

Figure 3-2. A dendrogram is a visualization of hierarchical clustering

This dendrogram ...

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required