Introduction

In every line of business ranging from running a small business to creating and managing a mission critical application, there are a number of tasks that are common and need to be included as a part of almost every workflow that is required during the course of executing the functions. This is true even for building robust machine learning systems. In Spark machine learning, some of these tasks range from splitting the data for model development (train, test, validate) to normalizing input feature vector data to creating ML pipelines via the Spark API. We provide a set of recipes in this chapter to enable the reader to think about what is actually required to implement an end-to-end machine learning system.

This chapter attempts ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.