Chapter 9. Back to the Feature: Building an Academic Paper Recommender

“In mathematics you don’t understand things. You just get used to them.”

John von Neumann

When the path from data to results was first introduced in Figure 1-1, it may not have been clear how there would ever be a way forward. Throughout this book, we have focused on introducing basic principles of feature engineering using toy models and clean, simple datasets. These examples were intended to be illustrative and enlightening. 

Machine learning examples generally show the best-case scenario and results. This masks the path we have described thus far in the book. Now that the foundation is set, we are leaving the world of simple, toy data and diving into the process of feature engineering with a real-world, structured dataset. As we move through each step, we will be examining the raw data forming each feature, what the transformed feature becomes, and what trade-offs we make along the way.

To be clear, our goal for this example is not to build the best model for this dataset. Rather, it is to demonstrate the practical application of a handful of our techniques, as well as how to more deeply examine and understand whether each technique is providing value to the model one is building.

Item-Based Collaborative Filtering

Our task will be to build a recommender for academic papers using a subsample of the Microsoft Academic Graph dataset. This should come in extremely handy for all of you who are searching for ...

Get Feature Engineering for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.