Introduction

This chapter, along with the next chapter, covers the fundamental techniques for regression and classification available in Spark 2.0 ML and MLlib library. Spark 2.0 highlights a new direction by moving the RDD-based regressions (see the next chapter) to maintenance mode while emphasizing Linear Regression and Generalized Regression going forward.

At a high level, the new API design favors parameterization of elastic net to produce the ridge versus Lasso regression and everything in between, as opposed to a named API (for example, LassoWithSGD). The new API approach is a much cleaner design and forces you to learn elastic net and its power when it comes to feature engineering that remains an art in data science. We provide adequate ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.