Chapter 9. There Is No Spoon – the Realities of Real-time Distributed Data Collection

In this last chapter, I thought we should cover some of the less concrete, random thoughts I have around data collection into Hadoop. There's no hard science behind some of this, and you should feel perfectly alright to disagree with me.

While Hadoop is a great tool to consume vast quantities of data, I often think of a picture of the logjam that occurred in 1886 in the St. Croix River in Minnesota (http://www.nps.gov/sacn/historyculture/stories.htm). When dealing with too much data, you want to make sure you don't jam your river. Be sure to take the previous chapter on monitoring seriously and not just as nice-to-have information.

Transport time versus log time ...

Get Apache Flume: Distributed Log Collection for Hadoop - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.