We talked a little bit about assessing clusters when the ground truth is not known. However, we have not yet talked about assessing KMeans when the cluster is known. In a lot of cases, this isn't knowable; however, if there is outside annotation, we will know the ground truth, or at least the proxy, sometimes.
So, let's assume a world where we have some outside agent supplying us with the ground truth.
We'll create a simple dataset, evaluate the measures of correctness against the ground truth in several ways, and then discuss them:
>>> from sklearn import datasets >>> from sklearn import cluster >>> blobs, ground_truth = datasets.make_blobs(1000, centers=3, cluster_std=1.75)
Before we ...