O'Reilly logo

Apache Flume: Distributed Log Collection for Hadoop by Steve Hoffman

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Monitoring performance metrics

Now that we have covered a few options for process monitoring, how do you know if your application is actually doing the work you think it is? On many occasions I've seen a stuck syslog-ng process that appeared to be running, but it just wasn't sending any data. I'm not picking on syslog-ng specifically; all software does this when conditions occur that it isn't designed to deal with.

When talking about Flume data flows, you need to monitor the following:

  • Data entering sources is within expected rates
  • Data isn't overflowing your channels
  • Data is exiting sinks at an expected rates

Flume has a pluggable monitoring framework, but as mentioned at the beginning of the chapter, it is still very much a work in progress. That ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required