Clustering data with k-means algorithm

The k-means clustering algorithm is likely the most widely known data mining technique for clustering vectorized data. It aims at partitioning the observations into discrete clusters based on the similarity between them; the deciding factor is the Euclidean distance between the observation and centroid of the nearest cluster.

Getting ready

To run this recipe, you need pandas and Scikit. No other prerequisites are required.

How to do it…

Scikit offers several clustering models in its cluster submodule. Here, we will use .KMeans(...) to estimate our clustering model (the clustering_kmeans.py file):

def findClusters_kmeans(data): ''' Cluster data using k-means ''' # create the classifier object kmeans = cl.KMeans( ...

Get Practical Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.