Getting ready

In this recipe, we combine the ML pipelines and logistic regression to demonstrate how you can combine various steps in a single pipeline that operates on DataFrames as they get transformed and travel through the pipe. We skip some of the steps, such as splitting the data and model evaluation, and reserve them for later chapters to make the program shorter, but provide a full treatment of pipeline, DataFrame, estimators, and transformers in a single recipe.

This recipe explores the details of the pipeline and DataFrames as they travel through the pipeline and get operated on.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.