Summary

In this chapter, we covered several real-world considerations you need to think about when planning your Flume implementation, including:

  • Transport time does not always match event time
  • The mayhem introduced with Daylight Savings Time to certain time-based logic
  • Capacity planning considerations
  • Items to consider when you have more than one data center
  • Data compliance
  • Data retention and expiration

I hope you enjoyed this book. Hopefully, you will be able to apply much of this information directly in your application/Hadoop integration efforts.

Thanks, this was fun!

Get Apache Flume: Distributed Log Collection for Hadoop - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.