Common Recipes for Implementing a Robust Machine Learning System

In this chapter, we will cover:

  • Spark's basic statistical API to help you build your own algorithms
  • ML pipelines for real-life machine learning applications
  • Normalizing data with Spark
  • Splitting data for training and testing
  • Common operations with the new Dataset API
  • Creating and using RDD versus DataFrame versus Dataset from a text file in Spark 2.0
  • LabeledPoint data structure for Spark ML
  • Getting access to Spark cluster in Spark 2.0+
  • Getting access to Spark cluster pre-Spark 2.0
  • Getting access to SparkContext vis-a-vis SparkSession object in Spark 2.0
  • New model export and PMML markup in Spark 2.0
  • Regression model evaluation using Spark 2.0
  • Binary classification model evaluation ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.