Chapter 6. Getting Started with Machine Learning Using MLlib

This chapter is divided into the following recipes:

  • Creating vectors
  • Creating a labeled point
  • Creating matrices
  • Calculating summary statistics
  • Calculating correlation
  • Doing hypothesis testing
  • Creating machine learning pipelines using ML

Introduction

The following is Wikipedia's definition of machine learning:

"Machine learning is a scientific discipline that explores the construction and study of algorithms that can learn from data."

Essentially, machine learning is making use of past data to make predictions about the future. Machine learning heavily depends upon statistical analysis and methodology.

In statistics, there are four types of measurement scales:

Scale type

Description

Nominal Scale ...

Get Spark Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.