Clustering

In this example, we will look at a cluster finding algorithm in Scikit-learn called DBSCAN. DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise, and is a clustering algorithm that favors groups of points and can identify points outside any of these groups (clusters) as noise (outliers). As with the linear machine learning methods, Scikit-learn makes it very easy to work with it. We first read in the data from  Chapter 5 Clustering, with Pandas' read_pickle function:

TABLE_FILE = 'data/test.pick' 
mycat = pd.read_pickle(TABLE_FILE) 

As with the previous dataset, to refresh your memory, we plot the data. It contains a slice of the mapped nearby Universe, that is, galaxies with determined positions (direction and ...

Get Mastering Python Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.