Chapter 9. Clustering algorithms in Mahout

This chapter covers

  • K-means clustering
  • Centroid generation using canopy clustering
  • Fuzzy k-means clustering and Dirichlet clustering
  • Topic modeling using latent Dirichlet allocation as a variant of clustering

Now that you know how input data is represented as Vectors and how SequenceFiles are created as input for the clustering algorithms, you’re ready to explore the various clustering algorithms that Mahout provides. There are many clustering algorithms in Mahout, and some work well for a given data set whereas others don’t. K-means is a generic clustering algorithm that can be molded easily to fit almost all situations. It’s also simple to understand and can easily be executed on parallel computers. ...

Get Mahout in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.