Part 2. Clustering

This part of the book, including chapters 7 through 12, explores clustering algorithms in Apache Mahout. With the techniques described here, you can group together similar-looking pieces of data into a set or a cluster. Clustering helps uncover interesting groups of information in a large volume of data. This part of the book begins with simple problems in clustering involving examples written in Java. As we progress, you’ll see more real-world examples and learn how to make Apache Mahout run as Hadoop jobs that can cluster large data easily.

Chapter 7 introduces the notion of clustering and explains it with an example of clustering points in a 2-dimensional plane. Chapter 8 introduces the concept of vectors and explains ...

Get Mahout in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.