How to do it...

You will need one of the following command-line tools curl or wget to retrieve the specified data:

  1. You can start by downloading the dataset using either two of the following commands. The first command is as follows:
http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data

This is an alternative that you can use:

wget http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data -o pima-indians-diabetes.data
  1. Now we begin our first steps of data exploration by seeing how the data in pima-indians-diabetes.data is formatted (from Mac or Linux Terminal):
head -5 pima-indians-diabetes.data6,148,72,35,0,33.6,0.627,50,11,85,66,29,0,26.6,0.351,31,0 ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.