Cover by Chuck Lam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

Chapter 5. Advanced MapReduce

This chapter covers

  • Chaining multiple MapReduce jobs
  • Performing joins of multiple data sets
  • Creating Bloom filters

As your data processing becomes more complex you’ll want to exploit different Hadoop features. This chapter will focus on some of these more advanced techniques.

When handling advanced data processing, you’ll often find that you can’t program the process into a single MapReduce job. Hadoop supports chaining MapReduce programs together to form a bigger job. You’ll also find that advanced data processing often involves more than one data set. We’ll explore various joining techniques in Hadoop for simultaneously processing multiple data sets. You can code certain data processing tasks more efficiently ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required