O'Reilly logo

Hadoop in Practice by Alex Holmes

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. Diagnosing and tuning performance problems

 

In this chapter
  • Measuring and visualizing MapReduce execution times
  • Optimizing the shuffle and sort phases
  • Improving performance with user space MapReduce best practices

 

Imagine you wrote a new piece of MapReduce code and you’re executing it on your shiny new cluster. You’re surprised to learn that despite having a good-size cluster, your job is running significantly longer than you expected. You’ve obviously hit a performance issue with your job, but how do you figure out where the problem lies?

One of Hadoop’s selling points when it comes to performance is that it scales horizontally. This means that adding nodes tends to yield a linear increase in throughput, and often in job execution ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required