C

Datasets

Besides the tiny weather family of datasets presented in Chapter 1 and artificially generated datasets in some chapters, the R code examples use a set of real datasets originating from various sources. They are all available for download from the UCI Machine Learning Repository. Except for those used by case studies in Chapter 20, the datasets do not actually have to be downloaded from the repository, since they are also available in R packages, mlbench and datasets. It still makes sense to check the corresponding UCI pages for some basic characteristics of the data as well as information about their origin and past usage. The table presented below lists all the UCI datasets used in this book, providing their original repository names as well R package names, where available. The corresponding links to the UCI pages can be constructed using the following simple template:

http://archive.ics.uci.edu/ml/datasets/name

with name replaced by UCI dataset name.

Dataset UCI name R package/name
Census Income Census-Income+(KDD)
Communities and Crime Communities+and+Crime
Cover Type Covertype
Boston Housing Housing mlbench/BostonHousing
Glass Glass+Identification mlbench/Glass
HouseVotes84 Congressional+Voting+Records mlbench/HouseVotes84
Iris Iris datasets/iris
Pima Indians Diabetes Pima+Indians+Diabetes mlbench/PimaIndiansDiabetes
Soybean Soybean+(Large) mlbench/Soybean
Vehicle Silhouettes Statlog+(Vehicle+Silhouettes) mlbench/Vehicle

Get Data Mining Algorithms: Explained Using R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.