Useful datasets

One of the best data sources is the UCI Machine Learning Repository. When we go to the web page at https://archive.ics.uci.edu/ml/datasets.html, we see the following list:

For example, if we click the first dataset (Abalone), we see the following. To save space, only the top part is shown:

From the web page, users can download the dataset and find definitions of variables and even citations. The code that follows can be used to download a related R dataset:

dataSet<-"UCIdatasets" path<-"http://canisius.edu/~yany/RData/" con<-paste(path,dataSet,".RData",sep='') ...

Get Hands-On Data Science with Anaconda now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.