Evaluation

A clustering algorithm's quality can be estimated by using the logLikelihood measure, which measures how consistent the identified clusters are. The dataset is split into multiple folds, and clustering is run with each fold. The motivation is that, if the clustering algorithm assigns a high probability to similar data that wasn't used to fit parameters, then it has probably done a good job of capturing the data structure. Weka offers the CluterEvaluation class to estimate it, as follows:

double logLikelihood = ClusterEvaluation.crossValidateModel(model, data, 10, new Random(1));System.out.println(logLikelihood);  

It provides the following output:

-8.773410259774291 

Get Machine Learning in Java - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.