O'Reilly logo

HDInsight Essentials by Rajesh Nadipalli

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

MapReduce solution

If you want to build your own application on top of HDFS, MapReduce provides low-level programming ability. MapReduce is also the foundation for other high-level projects such as Hive and Pig. Let's get into more detail of a MapReduce example with the airline on-time performance dataset.

Design

The approach for this problem statement is as described:

  • We have a Map job that processes one line at a time (new line).
  • The Map task emits a key, as a concatenated string of "Year + Month + Unique Carrier", and a value as a metric such as Departure Delay.
  • The Reduce task receives all the values for a given key, for example, 2012,01,AA, and does the math: average departure delay = total departure delay / total number of entries. The MapReduce ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required