Chapter 9. Unsupervised Learning with MLlib

This chapter will cover how we can do unsupervised learning using MLlib, Spark's machine learning library.

This chapter is divided into the following recipes:

Clustering using k-means
Dimensionality reduction with principal component analysis
Dimensionality reduction with singular value decomposition

Introduction

The following is Wikipedia's definition of unsupervised learning:

"In machine learning, the problem of unsupervised learning is that of trying to find hidden structure in unlabeled data."

In contrast to supervised learning where we have labeled data to train an algorithm, in unsupervised learning we ask the algorithm to find a structure on its own. Let's take a look at the following sample dataset: ...

Get Spark Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Spark Cookbook by Rishi Yadav

Chapter 9. Unsupervised Learning with MLlib

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly