O'Reilly logo

R for Data Science by Dan Toomey

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Dataset

Machine learning works by featuring a dataset that we break up into a training section and a testing section. We use the training data to come up with our model. We can then prove or test that model against the remaining testing section data.

The first issue is finding a dataset with several variables and, hopefully, several hundred observations. I am using the housing data from http://uci.edu. Let's find the dataset using the following command:

> housing <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data")
> colnames(housing) <- c("CRIM","ZN","INDUS","CHAS","NOX","RM","AGE","DIS","RAD","TAX","PRATIO","B","LSTAT","MDEV")

There are close to 500 observations with 14 variables. We can see a summary for ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required