Cluster tuning

In addition to the previous comments specific to a cluster run on EMR, there are some general thoughts to keep in mind when running workloads on any type of cluster. This will, of course, be more explicit when running outside of EMR as it often abstracts some of the details.

JVM considerations

You should be running the 64-bit version of a JVM and using the server mode. This can take longer to produce optimized code, but it also uses more aggressive strategies and will re-optimize code over time. This makes it a much better fit for long-running services, such as Hadoop processes.

Ensure that you allocate enough memory to the JVM to prevent overly-frequent Garbage Collection (GC) pauses. The concurrent mark-and-sweep collector is currently ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.