How Many 9s?

Ask any executive how much downtime is acceptable, and you will get the same answer: "None." Tell the executive how much that will cost and the performance impact, and he might answer differently. Five 9s is the gold standard for availability, championed mainly by telcos, but that's primarily due to lifeline requirements. Most Internet applications haven't yet reached life-or-death availability requirements. Five 9s remains a worthwhile goal for web operators, and critical parts of the infrastructure can (and should) be architected to those levels, but for individual applications it may be overkill. The computers that end users constantly fight with, Internet connections, and cheap, at-home NATs don't come anywhere close, so a few minutes of server disruption here or there often goes unnoticed.

Instead of focusing on 9s, it's beneficial to analyze your recovery-time objective (RTO) and recovery-point objective (RPO). RTO is how long it takes you to get the site back up after an outage. RPO is how much data you are willing to lose. Often, the two goals are in competition. If you have a zero RPO and your replication lags by a few minutes, you make choose to take a multihour outage while you get the primary back up, rather than lose those few minutes of data by failing over. On the other hand, if you have a zero RTO, you may decide to fail over immediately, willing to take the loss of a few in-flight transactions. The only way to get both is to sacrifice performance by ...

Get Web Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.