Chapter 10. Data Collection with Flume

In the previous two chapters, we've seen how Hive and Sqoop give a relational database interface to Hadoop and allow it to exchange data with "real" databases. Although this is a very common use case, there are, of course, many different types of data sources that we may want to get into Hadoop.

In this chapter, we will cover:

An overview of data commonly processed in Hadoop
Simple approaches to pull this data into Hadoop
How Apache Flume can make this task a lot easier
Common patterns for simple through sophisticated, Flume setups
Common issues, such as the data lifecycle, that need to be considered regardless of technology

A note about AWS

This chapter will discuss AWS less than any other in the book. In fact, ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop: Data Processing and Modelling by Garry Turkington, Tanmay Deshpande, Sandeep Karanth

Chapter 10. Data Collection with Flume

A note about AWS

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly