Lasso regression with SGD optimization in Spark 2.0

In this recipe, we will use the housing dataset from the previous recipes to demonstrate shrinkage with Spark's RDD-based lasso regression LassoWithSGD(), which can select a subset of parameters by setting the other weights to zero (hence eliminating some parameters based on the threshold) while reducing the effect of others (regularization). We emphasize again that ridge regression reduces the parameter weight, but never sets it to zero.

LassoWithSGD(), which is Spark's RDD-based lasso (Least Absolute Shrinkage and Selection Operator) API, a regression method that performs both variable selection and regularization at the same time in order to eliminate non-contributing explanatory variables ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.