Doing ridge regression

An alternate way to lasso to improve prediction quality is ridge regression. While in lasso, a lot of features get their coefficients set to zero and, therefore, eliminated from an equation, in ridge, predictors or features are penalized, but are never set to zero.

How to do it…

  1. Start the Spark shell:
    $ spark-shell
    
  2. Import the statistics and related classes:
    scala> import org.apache.spark.mllib.linalg.Vectors
    scala> import org.apache.spark.mllib.regression.LabeledPoint
    scala> import org.apache.spark.mllib.regression.RidgeRegressionWithSGD
    
  3. Create the LabeledPoint array with the house price as the label:
    scala> val points = Array(
    LabeledPoint(1,Vectors.dense(5,3,1,2,1,3,2,2,1)),
    LabeledPoint(2,Vectors.dense(9,8,8,9,7,9,8,7,9)) ...

Get Spark Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.