Apache Mahout

Apache Mahout is a scalable machine learning library. It is an open source library under the Apache Software Foundation. It supports algorithms for clustering, classification, and collaborative filtering on distributed platforms. Apache Mahout welcomes contributors to contribute any algorithm to the library. The algorithm coded may not always be distributed and can run on a single machine as well.

Tip

As Apache Mahout allows developers to introduce single-machine algorithms, it is recommended that you study the implementation before running it on Hadoop.

Apache Mahout has a few algorithms that are implemented as MapReduce. These algorithms can be run in Hadoop to exploit the parallelism on a distributed cluster. Again, a word of caution ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.