Using hierarchical model to cluster your data

The hierarchical clustering model aims at building a hierarchy of clusters. Conceptually, you might think of it as a decision tree of clusters: based on the similarity (or dissimilarity) between clusters, they are aggregated (or divided) into more general (more specific) clusters. The agglomerative approach is often referred to as bottom up, while the divisive is called top down.

Getting ready

To execute this recipe, you will need pandas, SciPy, and PyLab. No other prerequisites are required.

How to do it…

Hierarchical clustering can be extremely slow for big datasets as the complexity of the agglomerative algorithm is O(n3). To estimate our model, we use a single-linkage algorithm that has better complexity, ...

Get Practical Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.