Linear regression with SGD optimization in Spark 2.0

In this recipe, we use Spark RDD-based regression API to demonstrate how to use an iterative optimization technique to minimize the cost function and arrive at a solution for a linear regression.

We examine how Spark uses an iterative method to converge on a solution to the regression problem using a well-known method called Gradient Descent. Spark provides a more practical implementation known as SGD, which is used to compute the intercept (in this case set to 0) and the weights for the parameters.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.