Chapter 7. Handling Big Data

In this chapter, we will cover the following recipes:

Training an online logistic regression model using Apache Mahout
Applying an online logistic regression model using Apache Mahout
Solving simple text-mining problems with Apache Spark
Clustering using KMeans algorithm with MLib
Creating a linear regression model with MLib
Classifying data points with a Random Forest model using MLib

Introduction

In this chapter, you will see three key technologies used in Big Data framework, which are extremely useful for data scientists: Apache Mahout, Apache Spark, and its machine learning library named MLib.

We will start our chapter with Apache Mahout--a scalable or distributed machine learning platform for classification, regression, ...

Get Java Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Java Data Science Cookbook by Rushdi Shams

Chapter 7. Handling Big Data

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly