Basic System Outage Principles

A system fault can be caused by internal or external factors. Examples of internal factors could include specification or design errors, manufacturing defects, component defects, and component wear-out. Examples of external factors could include radiation, electromagnetic interference, operator error, and natural disasters.

Regardless of how well a system is designed, or how reliable the components are, failures cannot be eliminated completely. However, it is possible to manage failures and thereby minimize impact to the system.

An error is the occurrence of a system fault in the form of an incorrect binary output. If an error prevents a system (or subsystem) from performing its intended function, a failure has ...

Get Sun™ Cluster Environment: Sun Cluster 2.2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.