O'Reilly logo

Mahout in Action by Ellen Friedman, Ted Dunning, Robin Anil, Sean Owen

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 12. Real-world applications of clustering

This chapter covers

  • Clustering like-minded people on Twitter
  • Suggesting tags for an artist on Last.fm using clustering
  • Creating a related-posts feature for a website

You probably picked up this book to learn and understand how clustering can be applied to real-world problems. So far we’ve mostly focused on clustering the Reuter’s news data set, which had around 20,000 documents, each having about 1,000 to 2,000 words. The size of that data set isn’t challenging enough for Mahout to show its ability to scale. In this chapter, we use clustering to solve three interesting problems on much larger data sets.

First, we attempt to use the public tweets from Twitter (http://twitter.com) to find ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required