Chapter 7. Handling Big Data

In this chapter, we will cover the following recipes:

  • Training an online logistic regression model using Apache Mahout
  • Applying an online logistic regression model using Apache Mahout
  • Solving simple text-mining problems with Apache Spark
  • Clustering using KMeans algorithm with MLib
  • Creating a linear regression model with MLib
  • Classifying data points with a Random Forest model using MLib

Introduction

In this chapter, you will see three key technologies used in Big Data framework, which are extremely useful for data scientists: Apache Mahout, Apache Spark, and its machine learning library named MLib.

We will start our chapter with Apache Mahout--a scalable or distributed machine learning platform for classification, regression, ...

Get Java Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.