Assessing the performance of a clustering method

Without knowing the true labels, we cannot use the metrics introduced in the previous chapter. In this recipe, we will introduce three measures that will help us assess the effectiveness of our clustering methods: Davis-Bouldin, Pseudo-F (sometimes referred to as Calinski-Harabasz), and Silhouette Score are internal evaluation metrics. In contrast, if we knew the true labels, we could use a range of measures, such as Adjusted Rand Index, Homogeneity, or Completeness scores, to name a few.

Note

Refer to the documentation of Scikit on clustering methods for a deeper overview of various external evaluation metrics of clustering methods:

http://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation ...

Get Practical Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.