O'Reilly logo

Learning Hadoop 2 by Garry Turkington, Gabriele Modena

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Building a cluster on EMR

Elastic MapReduce is a flexible solution that, depending on requirements and workloads, can sit next to, or replace, a physical Hadoop cluster. As we've seen so far, EMR provides clusters preloaded and configured with Hive, Streaming, and Pig as well as with custom JAR clusters that allow the execution of MapReduce applications.

A second distinction to make is between transient and long-running life cycles. A transient EMR cluster is generated on demand; data is loaded in S3 or HDFS, some processing workflow is executed, output results are stored, and the cluster is automatically shut down. A long-running cluster is kept alive once the workflow terminates, and the cluster remains available for new data to be copied over ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required