Data exploration

The movie and the corresponding rating dataset were downloaded from the MovieLens website (https://movielens.org). According to the data description on the MovieLens website, all the ratings are described in the ratings.csv file. Each row of this file, followed by the header, represents one rating of one movie by one user.

The CSV dataset has the following columns: userId, movieId, rating, and timestamp. These are shown in Figure 14. The rows are ordered first by userId and within the user by movieId. Ratings are made on a five-star scale, with half-star increments (0.5 stars up to a total of 5.0 stars). The timestamps represent the seconds since midnight in Coordinated Universal Time (UTC) on January 1, 1970. We have 105,339 ...

Get Scala Machine Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.