Summary

Real-time operations comprise a set of key functions that must operate within tight time constraints. Information flows into the real-time operations system from the instrumentation manager (the source of alert data) and from the SLA statistics modules, which provide time-sliced measurements of performance. The real-time operations system then processes the inputs in an attempt to improve MTBF—possibly by using proactive techniques to predict possible failures. At the same time, it tries to assist the operations staff in decreasing the MTTR when a failure actually occurs.

Reactive management, used to decrease MTTR, is based on the use of triage and root-cause analysis. Triage tries to identify the responsible organization very quickly, ...

Get Practical Service Level Management: Delivering High-Quality Web-Based Services now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.