O'Reilly logo

Learning Hadoop 2 by Garry Turkington, Gabriele Modena

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Prerequisites

Before diving into specific technologies, let's generate some data that we'll use in the examples throughout this chapter. We'll create a modified version of a former Pig script as the main functionality for this. The script in this chapter assumes that the Elephant Bird JARs used previously are available in the /jar directory on HDFS. The full source code is at https://github.com/learninghadoop2/book-examples/blob/master/ch7/extract_for_hive.pig, but the core of extract_for_hive.pig is as follows:

-- load JSON data tweets = load '$inputDir' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad'); -- Tweets tweets_tsv = foreach tweets { generate (chararray)CustomFormatToISO($0#'created_at', 'EEE MMMM d HH:mm:ss Z y') as ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required