Summary

This chapter covers all the basics of Apache Spark, which all machine learning professionals are expected to understand in order to utilize Apache Spark for practical machine learning projects. We focus our discussion on Apache Spark computing, and relate it to some of the most important machine learning components, in order to connect Apache Spark and machine learning together to fully prepare our readers for machine learning projects.

First, we provided a Spark overview, and also discussed Spark's advantages as well as Spark's computing model for machine learning.

Second, we reviewed machine learning algorithms, Spark's MLlib libraries, and other machine learning libraries.

In the third section, Spark's core innovations of RDD and DataFrame ...

Get Apache Spark Machine Learning Blueprints now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.