Collaborative filtering using implicit feedback

Sometimes the feedback available is not in the form of ratings but in the form of audio tracks played, movies watched, and so on. This data, at first glance, may not look as good as explicit ratings by users, but this is much more exhaustive.

Getting ready

We are going to use million song data from http://www.kaggle.com/c/msdchallenge/data. You need to download three files:

  • kaggle_visible_evaluation_triplets
  • kaggle_users.txt
  • kaggle_songs.txt

Now perform the following steps:

  1. Create a songdata folder in hdfs and put all the three files here:
    $ hdfs dfs -mkdir songdata
    
  2. Upload the song data to hdfs:
    $ hdfs dfs -put kaggle_visible_evaluation_triplets.txt songdata/
    $ hdfs dfs -put kaggle_users.txt songdata/

Get Spark Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.