How it works...

We started by defining a Seq data structure to house a series of vectors, each being a label and a feature vector. We then proceeded to convert the data structure to a DataFrame and ran it through Estimator.fit()to produce a model that fits the data. We examined the model's parameters and DataFrame schemas to understand the resulting model. We then proceeded to combine .select() and .predict() to decompose the DataFrame before looping to display the predictions and result.

While we don't have to use pipelines (a workflow concept in Spark borrowed from scikit-learn, http://scikit-learn.org/stable/index.html) to run a regression, we decided to expose you to the power of Spark ML pipelines and logistic regression algorithms in ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.