Chapter 3. What I Do, Not What I Say

One of the most important steps in any machine-learning project is data extraction. Which data should you choose? How should it be prepared to be appropriate input for your machine-learning model?

In the case of recommendation, the choice of data depends in part on what you think will best reveal what users want to do—what they like and do not like—such that the recommendations your system offers are effective. The best choice of data may surprise you—it’s not user ratings. What a user actually does usually tells you much more about her preferences than what she claims to like when filling out a customer ratings form. One reason is that the ratings come from a subset of your user pool (and a skewed one at that—it’s comprised of the users who like [or at least are willing] to rate content). In addition, people who feel strongly in the positive or negative about an item or option may be more motivated to rate it than those who are somewhat neutral, again skewing results. We’ve seen some cases where no more than a few percent of users would rate content.

Furthermore, most people do not entirely understand their own likes and dislikes, especially where new and unexplored activities are concerned. The good news is that there is a simple solution: you can watch what a user does instead of just what he says in ratings. Of course it is not enough to watch one or a few users; those few observations will not give you a reliable way to make recommendations. ...

Get Practical Machine Learning: Innovations in Recommendation now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.