O'Reilly logo

Patterns for Fault Tolerant Software by Robert S. Hanmer

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Introduction to the Patterns

This chapter builds on the definitions of Chapter 1 and the Fault Tolerant Mindset of Chapter 2 to provide an introduction to the patterns. This includes information about the context that is assumed by, and is shared by, the patterns found in the later chapters.

Four phases of fault tolerance describe the execution time lifecycle of a fault: error detection, error recovery, error mitigation and fault treatment. These are shown in Figure 8. To be fault tolerant, the first thing that must happen when a fault activates and an error occurs is error detection. This can happen through a routine means such as an audit (checksum) check, or through special components that are designed to detect when an error has happened.

Once detected the error must be processed, which is the focus of the next two phases. These phases are executed in real time and will affect the unavailability of the system. Error recovery works to substitute an error-free system state for the erroneous system state that was detected.

In some cases the error can be removed, or mitigated, without transitioning to a different system state. For example when an erroneous data value can be corrected and processing can continue, as opposed to returning the system to the state from which it could attempt to recompute the data value correctly.

Four phases of fault tolerance

Figure 3.1. Four phases of fault tolerance

Fault treatment ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required