Chapter 8. There Is No Spoon – The Realities of Real-time Distributed Data Collection

In this last chapter, I thought we'd cover some of the less concrete, more random thoughts I have around data collection into Hadoop. There's no hard science behind some of this and you should feel perfectly at ease to disagree with me.

While Hadoop is a great tool for consuming vast quantities of data, I often think of a picture of the logjam that occurred in 1886 on the St. Croix River in Minnesota (http://www.nps.gov/sacn/historyculture/stories.htm). When dealing with too much data you want to make sure you don't jam your river. Be sure you take the previous chapter on monitoring seriously and not just as a nice to have.

Transport time versus log time

I had a ...

Get Apache Flume: Distributed Log Collection for Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.