O'Reilly logo

Practice of Cloud System Administration, The: DevOps and SRE Practices for Web Services, Volume 2 by Christina J. Hogan, Strata R. Chalup, Thomas A. Limoncelli

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 15. Disaster Preparedness

Failure is not falling down but refusing to get back up.

—Theodore Roosevelt

Disasters and major outages happen. Everyone in the company from the top down needs to recognize that fact and adopt a mindset that accepts outages and learns from them. An operations organization needs to be able to handle outages well and avoid repeating past mistakes.

Previously we’ve examined technology related to being resilient to failures and outages as well as organizational strategies like oncall. In this chapter we discuss disaster preparedness at the individual, team, procedural, and organizational levels. People must be trained so that they know the procedure well enough that they can execute it with confidence. Teams need ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required