How to do it...

  1. You can start with downloading the dataset using either of the following commands:
wget http://files.grouplens.org/datasets/movielens/ml-1m.zip

You can also use the following command:

curl http://files.grouplens.org/datasets/movielens/ml-1m.zip -o ml-1m.zip
  1. Now you need to decompress the ZIP:
unzip ml-1m.zipcreating: ml-1m/inflating: ml-1m/movies.datinflating: ml-1m/ratings.datinflating: ml-1m/READMEinflating: ml-1m/users.dat

The command will create a directory named ml-1m with data files decompressed inside.

  1. Change into the directory m1-1m:
cd m1-1m
  1. Now we begin our first steps of data exploration by verifying how the data in movies.dat is formatted:
head -5 movies.dat1::Toy Story (1995)::Animation|Children's|Comedy ...

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.