O'Reilly logo

Mastering Hadoop by Sandeep Karanth

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2. Advanced MapReduce

MapReduce is a programming model for parallel and distributed processing of data. It consists of two steps: Map and Reduce. These steps are inspired from functional programming, a branch of computer science that deals with mathematical functions as computational units. Properties of functions such as immutability and statelessness are attractive for parallel and distributed processing. They provide a high degree of parallelism and fault tolerance at lower costs and semantic complexity.

In this chapter, we will look at advanced optimizations when running MapReduce jobs on Hadoop clusters. Every MapReduce job has input data and a Map task per split of this data. The Map task calls a map function repeatedly on every record, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required