Uptime versus Downtime

Generally, if high availability is desired, both uptime and downtime should be addressed. Uptime is a function of components and architecture, whereas downtime is a function of a process that includes monitoring, fault and failure detection, diagnosis, repair preparation, repair processes, testing, and service restoration. To understand these individual terms, consider a variety of things that need to happen when a service goes down.

Monitoring. If you don’t monitor status and performance, there will delays in recognizing that there has been an outage.
Detection. Just because you monitor, it does not mean that you will realize that there is an actual problem.
Diagnosis. Realizing that there is a problem does not mean that the root cause will be immediately obvious. Often an issue in one area—say, network congestion—will cause a problem in another—say, service unavailability due to an application time-out.
Repair preparation. The fact that a root cause has been identified does not mean that a repair will occur immediately. For hardware problems, spare parts may need to be ordered or correctly retrieved from spares inventory; for software problems, a patch may need to be written.
Repair. The repair process may require time: disassembling components, shutting down zones, and so forth.
Testing. Ensuring that the repair was conducted properly and that the component, subsystem, or system is ready for use requires time as well.
Restoration. Finally, a cutover of ...

Get Cloudonomics: The Business Value of Cloud Computing, + Website now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Cloudonomics: The Business Value of Cloud Computing, + Website by Joe Weinman

Uptime versus Downtime

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly