Factors affecting the performance of MapReduce

The processing time of input data with MapReduce may be affected by many factors. One of these factors is the algorithm you use while implementing your map and reduce functions. Other external factors may also affect the MapReduce performance. Based on our experience and observation, the following are the major factors that may affect MapReduce performance:

  • Hardware (or resources) such as CPU clock, disk I/O, network bandwidth, and memory size.
  • The underlying storage system.
  • Data size for input data, shuffle data, and output data, which are closely correlated with the runtime of a job.
  • Job algorithms (or program) such as map, reduce, partition, combine, and compress. Some algorithms may be hard to conceptualize ...

Get Optimizing Hadoop for MapReduce now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.