Summary
In this chapter, we covered several real-world considerations you need to think about when planning your Flume implementation, including:
- Transport time does not always match event time
- The mayhem introduced with Daylight Savings Time to certain time-based logic
- Capacity planning considerations
- Items to consider when you have more than one data center
- Data compliance
- Data retention and expiration
I hope you enjoyed this book. Hopefully, you will be able to apply much of this information directly in your application/Hadoop integration efforts.
Thanks, this was fun!
Get Apache Flume: Distributed Log Collection for Hadoop - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.