Finding groups of potential subscribers with DBSCAN and BIRCH algorithms

Density-based Spatial Clustering of Applications with Noise (DBSCAN) and Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithms were the first approaches developed to handle noisy data effectively. Noise here is understood as data points that seem completely out of place when compared with the rest of the dataset; DBSCAN puts such observations into an unclassified bucket while BIRCH treats them as outliers and removes them from the dataset.

Getting ready

To execute this recipe, you will need pandas and Scikit. No other prerequisites are required.

How to do it…

Both the algorithms can be found in Scikit. To use DBSCAN, use the code found in the clustering_dbscan.py ...

Get Practical Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.