The Apache Mahout project aims to build a scalable machine learning library. It is built atop scalable, distributed architectures, such as Hadoop, using the MapReduce paradigm, which is an approach for processing and generating large datasets with a parallel, distributed algorithm using a cluster of servers.
Mahout features a console interface and the Java API as scalable algorithms for clustering, classification, and collaborative filtering. It is able to solve three business problems:
- Item recommendation: Recommending items such as People who liked this movie also liked
- Clustering: Sorting of text documents into groups of topically-related documents
- Classification: Learning which topic to assign to an unlabelled document ...