Summary

In this chapter, we discussed different clustering algorithms in Mahout. We discussed the concept of k-means to better understand the clustering process, looked at command-line examples of various clustering algorithms, and finally discussed implementing k-means using Mahout Java API. I would encourage you to experiment with the different datasets and different settings/configurations of each algorithm to get a deeper understanding of the usage of clustering algorithms.

In the next chapter, we are going to discuss Mahout on top of Apache Spark. Mahout is being ported to Spark in Mahout 1.0, so carefully read this next chapter. It will help you get started with Mahout 1.0 when it is released.

Get Learning Apache Mahout now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.