Chapter 5. Working with Spark MLlib

In this chapter, you will learn about the MLlib component of Spark. We will cover the following recipes:

Implementing Naive Bayes classification
Implementing decision trees
Building a recommendation system
Implementing logistic regression using Spark ML pipelines

Introduction

MLlib is the machine learning (ML) library that is provided with Apache Spark, the in-memory, cluster-based, open source data processing system. In this chapter, I will examine the functionality of algorithms provided within the MLlib library in terms of areas of machine learning tasks such as classification, recommendation, and neural processing. For each algorithm, we'll provide working examples that tackle real problems. We will take a step-by-step ...

Get Apache Spark for Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Apache Spark for Data Science Cookbook by Padma Priya Chitturi

Chapter 5. Working with Spark MLlib

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly