Chapter 6. Design Patterns for Resiliency

Success is not final, failure is not fatal: it is the courage to continue that counts.

—Winston Churchill

Resiliency is a system’s ability to constructively deal with failures. A resilient system detects failure and routes around it. Nonresilient systems fall down when faced with a malfunction. This chapter is about software-based resiliency and documents the most common techniques used.

Resiliency is important because no one goes to a web site that is down. Hardware fails—that is a fact of life. You can buy the most reliable, expensive hardware in the world and there will be some amount of failures. In a sufficiently large system, a one in a million failure is a daily occurrence.

During the first year ...

Get Practice of Cloud System Administration, The: DevOps and SRE Practices for Web Services, Volume 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.