O'Reilly logo

Learning Hadoop 2 by Garry Turkington, Gabriele Modena

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Cluster tuning

In addition to the previous comments specific to a cluster run on EMR, there are some general thoughts to keep in mind when running workloads on any type of cluster. This will, of course, be more explicit when running outside of EMR as it often abstracts some of the details.

JVM considerations

You should be running the 64-bit version of a JVM and using the server mode. This can take longer to produce optimized code, but it also uses more aggressive strategies and will re-optimize code over time. This makes it a much better fit for long-running services, such as Hadoop processes.

Ensure that you allocate enough memory to the JVM to prevent overly-frequent Garbage Collection (GC) pauses. The concurrent mark-and-sweep collector is currently ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required