Evaluating the performance of clustering algorithms

So far, we built different clustering algorithms but didn't measure their performances. In supervised learning, we just compare the predicted values with the original labels to compute their accuracy. In unsupervised learning, we don't have any labels. Therefore, we need a way to measure the performance of our algorithms.

A good way to measure a clustering algorithm is by seeing how well the clusters are separated. Are the clusters well separated? Are the datapoints in a cluster tight enough? We need a metric that can quantify this behavior. We will use a metric, called Silhouette Coefficient score. This score is defined for each datapoint. This coefficient is defined as follows:

score = (x – ...

Get Python Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.