Chapter 5. Sinks

Flume is designed with the ability to plug in practically every component, including the ones that write the data out to the eventual destination—in most cases, some data store.

The component that removes data from a Flume agent and writes it to another agent or a data store or some other system is called a sink. To facilitate this process, Flume allows the user to configure the sink, which could be one of the sinks that comes bundled with Flume or one that was written by the user (for custom sinks not built into Flume, the JARs should be dropped into Flume’s plugins.d directory).

Sinks are the components in a Flume agent that keep draining the channel, so that the sources can continue receiving events and writing to the channel. Sinks continuously poll the channel for events and remove them in batches. These batches of events are either written out to a storage or indexing system, or sent to another Flume agent.

Sinks are fully transactional. Each sink starts a transaction with the channel before removing events in batches from it. Once the batch of events is successfully written out to storage or to the next Flume agent, the sink commits the transaction with the channel. Once the transaction is committed, the channel removes the events from its own internal buffers.

Flume comes packaged with a number of sinks that can write to storage and indexing systems such as HDFS, HBase, Solr, Elastic Search, etc. These sinks are what are generally referred to as

Get Using Flume now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.