Linear regression using SparkR

In the following example, we will illustrate how to use SparkR for machine learning. For this, we will use the same dataset of energy efficiency measurements that we used for linear regression in Chapter 5, Bayesian Regression Models:

>library(SparkR)
>sc <- sparkR.init(master="local")
>sqlContext <- sparkRSQL.init(sc)

#Importing data
>df <- read.csv("/Users/harikoduvely/Projects/Book/Data/ENB2012_data.csv",header = T)
>#Excluding variable Y2,X6,X8 and removing records from 768 containing mainly null values
>df <- df[1:768,c(1,2,3,4,5,7,9)]
>#Converting to a Spark R Dataframe
>dfsr <- createDataFrame(sqlContext,df) 
>model <- glm(Y1 ~ X1 + X2 + X3 + X4 + X5 + X7,data = dfsr,family = "gaussian")
 > summary(model)

Get Learning Bayesian Models with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.