There's more...

The pipeline in Spark ML was inspired by scikit-learn in Python, which is referenced here for completeness:

http://scikit-learn.org/stable/

ML pipelines make it easy to combine multiple algorithms used to implement a production task in Spark. It would be unusual to see a use case in a real-life situation that is made of a single algorithm. Often a number of cooperating ML algorithms work together to achieve a complex use case. For example, in LDA-based systems (for example, news briefings) or human emotion detection, there are a number of steps before and after the core system to be implemented as a single pipe to produce any meaningful and production-worthy system. See the following link for a real-life use case requiring ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.