Once you have your HBase cluster up and running, it is essential to continuously ensure that it is operating as expected. This chapter explains how to monitor the status of the cluster with a variety of tools.
Just as it is vital to monitor production systems, which typically expose a large number of metrics that provide details regarding their current status, it is vital that you monitor HBase.
HBase actually inherits its monitoring APIs from Hadoop. But while Hadoop is a batch-oriented system, and therefore often is not immediately user-facing, HBase is user-facing, as it serves random access requests to, for example, drive a website. The response times of these requests should stay within specific limits to guarantee a positive user experience—also commonly referred to as a service-level agreement (SLA).
With distributed systems the administrator is facing the difficult task of making sense of the overall status of the system, while looking at each server separately. And even with a single server system it is difficult to know what is going on when all you have to go by is a handful of raw logfiles. When disaster strikes it would be good to see where—and when—it all started. But digging through mega-, giga-, or even terabytes of text-based files to find the needle in the haystack, so to speak, is something only a few people have mastered. And even if you have mad log-reading skills, it will take time to draw and test hypotheses to eventually ...