Executing a Hive script using EMR

Hive provides a SQL-like query layer for the data stored in HDFS utilizing Hadoop MapReduce underneath. Amazon EMR supports executing Hive queries on the data stored in S3. Refer to the Apache Hive recipes in Chapter 6, Hadoop Ecosystem – Apache Hive, for more information on using Hive for large-scale data analysis.

In this recipe, we are going to execute a Hive script to perform the computation we did in the Executing a Pig script using EMR recipe earlier. We will use the Human Development Reports data (http://hdr.undp.org/en/statistics/data/) to print names of countries that have a GNI value greater than $2000 of gross national income per capita (GNI) sorted by GNI.

How to do it...

The following steps show how ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.