Chapter 20. Designing for Monitoring

This chapter builds on the messages in Chapter 8 by looking at two of the most important diagnostic features that should be designed into an application for monitoring purposes: performance counters and event logs. Diagnostics also includes trace files and log files, although these are typically used during incident investigation (see Chapter 21). The incident investigation process is triggered from an event being raised during execution and monitoring. System monitoring is primarily performed by the operations team while the system is in live service. For the most part, once the system has stabilized, monitoring system behavior should be a passive activity — that is, the operations team should not have to constantly "watch" the system. While the system is stabilizing, however, the operations team monitors the startup, general execution, shutdown, and batch processes very closely. The diagnostics embedded within the system specify whether any attention is required.

When I refer to "monitoring," I'm focusing on the application-specific diagnostics, rather than all the other monitoring and operability requirements, such as monitoring the hardware and operating system, or monitoring whether services are running, starting, or stopping. The performance counters and events included within the application code are crucial to the monitoring process.

You can also use third-party products to obtain the performance counter data and events from the system, ...

Get Design – Build – Run: Applied Practices and Principles for Production-Ready Software Development now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.