O'Reilly logo

Learning Hadoop 2 by Garry Turkington, Gabriele Modena

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Building a tweet analysis capability

In earlier chapters, we used various implementations of Twitter data analysis to describe several concepts. We will take this capability to a deeper level and approach it as a major case study.

In this chapter, we will build a data ingest pipeline, constructing a production-ready dataflow that is designed with reliability and future evolution in mind.

We'll build out the pipeline incrementally throughout the chapter. At each stage, we'll highlight what has changed but can't include full listings at each stage without trebling the size of the chapter. The source code for this chapter, however, has every iteration in its full glory.

Getting the tweet data

The first thing we need to do is get the actual tweet data. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required