Chapter 14. Oncall

Be alert... the world needs more lerts.

—Woody Allen

Oncall is the way we handle exceptional situations. Even though we try to automate all operational tasks, there will always be responsibilities and edge cases that cannot be automated away. These exceptional situations can happen at any time of the day; they do not schedule themselves nicely between the hours of 9 AM and 5 PM.

Exceptional situations are, in brief, outages and anything that, if left unattended, would lead to an outage. More specifically, they are situations where the service is, or will become, in violation of the SLA.

An operations team needs a strategy to assure that exceptional situations are attended to promptly and receive appropriate action. The strategy ...

Get Practice of Cloud System Administration, The: DevOps and SRE Practices for Web Services, Volume 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.