Chapter 5. Working with Spark MLlib

In this chapter, you will learn about the MLlib component of Spark. We will cover the following recipes:

  • Implementing Naive Bayes classification
  • Implementing decision trees
  • Building a recommendation system
  • Implementing logistic regression using Spark ML pipelines

Introduction

MLlib is the machine learning (ML) library that is provided with Apache Spark, the in-memory, cluster-based, open source data processing system. In this chapter, I will examine the functionality of algorithms provided within the MLlib library in terms of areas of machine learning tasks such as classification, recommendation, and neural processing. For each algorithm, we'll provide working examples that tackle real problems. We will take a step-by-step ...

Get Apache Spark for Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.