Chapter 8. Monitoring and Alerts

The environments of many organizations are complex entities consisting of many different servers and applications. You've already seen what a typical production environment can look like, and that's just for one system. Monitoring is an integral part of the organization's effectiveness. Knowing when something is or isn't functioning correctly or requires manual intervention is critical to the operation of the business and its customers. Effective monitoring will help to ensure this. As mentioned in Chapter 7, the operations team is primarily responsible for monitoring all the systems in live service. This chapter looks at monitoring the different environments, servers, and applications.

This chapter is organized into the following sections:

  • What Is Monitoring? Examines the monitoring architecture and the various rules that can be put in place to filter and escalate information. It also discusses monitoring blackout windows, which are used to filter out alerts at certain periods of time.

  • What Is Monitored? Examines the various monitoring sources as well as some typical application and server monitoring. It also discusses the types of events that are captured, Windows and other third-party application performance counters, and custom performance counters updated by the application.

  • What Are Alerts? Examines how alerts are the trigger for incident investigation.

Get Design – Build – Run: Applied Practices and Principles for Production-Ready Software Development now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.