O'Reilly logo

Machine Learning in Action by Peter Harrington

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 15. Big data and MapReduce

 

This chapter covers
  • MapReduce
  • Using Python with Hadoop Streaming
  • Automating MapReduce with mrjob
  • Training support vector machines in parallel with the Pegasos algorithm

 

I often hear “Your examples are nice, but my data is big, man!” I have no doubt that you work with data sets larger than the examples used in this book. With so many devices connected to the internet and people interested in making data-driven decisions, the amount of data we’re collecting has outpaced our ability to process it. Fortunately, a number of open source software projects allow us to process large amounts of data. One project, called Hadoop, is a Java framework for distributing data processing to multiple machines.

Imagine ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required