Summary

In this chapter we discussed the problem that Flume is attempting to solve; getting data into your Hadoop cluster for data processing in an easily configured and reliable way. We also discussed the Flume agent and its logical components including: events, sources, channel selectors, channels, sink processors, and sinks.

The next chapter will cover these in more detail, specifically the most commonly used implementations of each. Like all good open source projects, almost all of these components are extensible if the bundled ones don't do what you need them to do.

Get Apache Flume: Distributed Log Collection for Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.