Chapter 3. Sources

Sources are the components responsible for accepting data into a Flume agent. Sources can accept data from other systems, like the Java Message Service (JMS), or the output of other processes. Sources are also used to receive data from other Flume agents whose sinks send data via RPC. There are even sources that can produce data. It is possible to write sources to accept data from pretty much anything!

The data sources receive from an external system or from other agents (or produce by themselves) is then written out to one or more channels configured for the source. This is the basic responsibility of a source.

In this chapter, we will discuss the design and working of various sources that come packaged with Flume and how to configure them optimally for use; we will also look at how to write a custom source.

Lifecycle of a Source

Sources are named components that are configured like any other component through the configuration file. Flume’s configuration system validates each source’s configuration and discards sources that are incorrectly configured. The validation done by the configuration system is pretty minimal, though. The Flume configuration system ensures that:

  • Each source has at least one properly configured channel “connected” to it.

  • Each source has a type parameter defined.

  • The source is in the active list of sources for the agent.

Once the configuration system approves a source, it is then instantiated and configured by the ConfigurationProvider ...

Get Using Flume now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.