O'Reilly logo

Mahout in Action by Ellen Friedman, Ted Dunning, Robin Anil, Sean Owen

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 10. Evaluating and improving clustering quality

This chapter covers

  • Inspecting clustering output
  • Evaluating the quality of clustering
  • Improving clustering quality

We saw many types of clustering algorithms in the last chapter: k-means, canopy, fuzzy k-means, Dirichlet, and latent Dirichlet analysis (LDA). They all performed well on certain types of data and sometimes poorly on others. The most natural question that comes to mind after every clustering job is, “How well did the algorithm perform on the data?”

Analyzing the output of clustering is an important exercise. It can be done with simple command-line tools or richer GUI-based visualizations. Once the clusters are visualized and problem areas are identified, these results ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required