O'Reilly logo

Mahout in Action by Ellen Friedman, Ted Dunning, Robin Anil, Sean Owen

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 9. Clustering algorithms in Mahout

This chapter covers

  • K-means clustering
  • Centroid generation using canopy clustering
  • Fuzzy k-means clustering and Dirichlet clustering
  • Topic modeling using latent Dirichlet allocation as a variant of clustering

Now that you know how input data is represented as Vectors and how SequenceFiles are created as input for the clustering algorithms, you’re ready to explore the various clustering algorithms that Mahout provides. There are many clustering algorithms in Mahout, and some work well for a given data set whereas others don’t. K-means is a generic clustering algorithm that can be molded easily to fit almost all situations. It’s also simple to understand and can easily be executed on parallel computers. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required