Learning Apache Mahout

by Chandramani Tiwary

Released March 2015

Publisher(s): Packt Publishing

ISBN: 9781783555215

Start your free trial

Book description

Acquire practical skills in Big Data Analytics and explore data science with Apache Mahout

In Detail

In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It implements machine learning algorithms on top of distributed processing platforms such as Hadoop and Spark.

Starting with the basics of Mahout and machine learning, you will explore prominent algorithms and their implementation in Mahout development. You will learn about Mahout building blocks, addressing feature extraction, reduction and the curse of dimensionality, delving into classification use cases with the random forest and Naïve Bayes classifier and item and user-based recommendation. You will then work with clustering Mahout using the K-means algorithm and implement Mahout without MapReduce. Finish with a flourish by exploring end-to-end use cases on customer analytics and test analytics to get a real-life practical know-how of analytics projects.

What You Will Learn

Configure Mahout on Linux systems and set up the development environment
Become familiar with the Mahout command line utilities and Java APIs
Understand the core concepts of machine learning and the classes that implement them
Integrate Apache Mahout with newer platforms such as Apache Spark
Solve classification, clustering, and recommendation problems with Mahout
Explore frequent pattern mining and topic modeling, the two main application areas of machine learning
Understand feature extraction, reduction, and the curse of dimensionality