O'Reilly logo

Mastering Python Data Analysis by Luiz Felipe Martins, Magnus Vilhelm Persson

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Clustering

In this example, we will look at a cluster finding algorithm in Scikit-learn called DBSCAN. DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise, and is a clustering algorithm that favors groups of points and can identify points outside any of these groups (clusters) as noise (outliers). As with the linear machine learning methods, Scikit-learn makes it very easy to work with it. We first read in the data from  Chapter 5 Clustering, with Pandas' read_pickle function:

TABLE_FILE = 'data/test.pick' 
mycat = pd.read_pickle(TABLE_FILE) 

As with the previous dataset, to refresh your memory, we plot the data. It contains a slice of the mapped nearby Universe, that is, galaxies with determined positions (direction and ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required