Functions for Big Data Sets
If you’re working with a very large data set, you may not have enough memory to use the standard regression functions. Luckily, R includes an alternative set of regression functions for working with big data sets. These functions are slower than the standard regression functions, but will work when there is not enough memory to use the standard regression functions:
library(biglm) # substitute for lm, works in dataframes biglm(formula, data, weights=NULL, sandwich=FALSE) # substitute for glm, works in data frames bigglm(formula, data, family=gaussian(), weights=NULL, sandwich=FALSE, maxit=8, tolerance=1e-7, start=NULL,quiet=FALSE,...)
It’s even possible to use bigglm
on data sets inside a database. To do
this, you would open a database connection using RODBC or RSQLite and
then call bigglm
with the data
argument specifying the database
connection and tablename
specifying
the table in which to evaluate the formula:
bigglm(formula, data, family=gaussian(), tablename, ..., chunksize=5000)
Get R in a Nutshell now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.