Silhouette score

This measure doesn't need to know the ground truth and can be used to check, at the same time, the intra-cluster cohesion and the inter-cluster separation. In order to define the Silhouette score, we need to introduce two auxiliary functions. The first one is the average intra-cluster distance of a sample xi belonging to a cluster Cj:

In the previous expression, n(k) is the number of samples assigned to the cluster Cj and d(a, b) is a standard distance function (in the majority of cases, the Euclidean distance is chosen). We need also to define the lowest inter-cluster distance which can be interpreted as the average nearest-cluster ...

Get Mastering Machine Learning Algorithms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.