Multivariate linear regression in Apache Spark

Returning to our case study, let's now develop a multivariate linear regression model in order to predict the total daily bike renters using our bike sharing dataset and a subset of independent variables:

The following sub-sections describe each of the pertinent cells in the corresponding Jupyter Notebook for this use case, entitled chp04-02-multivariate-linear-regression.ipynb, and which may be found in the GitHub repository accompanying this book. Note that for the sake of brevity, we will skip those cells that perform the same functions as seen previously.
  1. First, let's demonstrate how we can use Spark to calculate the correlation value between our dependent variable, cnt, and each independent ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.