O'Reilly logo

Bad Data Handbook by Q. Ethan McCallum

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 16. How to Feed and Care for Your Machine-Learning Experts

Pete Warden

Machine learning is a craft as well as a science, and for the best results you’ll often need to turn to experienced specialists. Not every team has enough interesting problems to justify a full-time machine-learning position, though. As the value of the approach becomes better-known, the demand for part-time or project-based machine-learning work has grown, but it’s often hard for a traditional engineering team to effectively work with outside experts in the field. I’m going to talk about some of the things I learned while running an outsourced project through Kaggle,[73] a community of thousands of researchers who participate in data competitions modeled on the Netflix Prize. This was an extreme example of outsourcing: we literally handed over a dataset, a short description, and a success metric to a large group of strangers. It had almost none of the traditional interactions you’d expect, but it did teach me valuable lessons that apply to any interactions with machine-learning specialists.

Define the Problem

My company Jetpac creates a travel magazine written by your friends, using vacation photos they’ve shared with you on Facebook and other social services. The average user has had over two hundred thousand pictures shared with them, so we have a lot to choose from. Unfortunately, many of them are not very good, at least for our purposes. When we showed people our prototype, most would be turned off ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required