The previous chapter defined the basic vocabulary and the four phases of fault tolerance. This chapter will look at techniques to design for fault tolerance and enhanced reliability and availability.
What can go wrong in any given situation? That is a key question to anyone trying to develop fault tolerant software. Thinking to ask the question and defi ne the solution is called having a Fault Tolerant Mindset. In almost any situation something can go wrong. A fault tolerant program is prepared for these errors. Asking whatif questions and planning during design for the errors that might happen during execution are the hallmarks of the Fault Tolerant Mindset. What if the stack pointer becomes negative? What if the wrong subclass is instantiated? What if the message arrives out of order?
Applying a Fault Tolerant Mindset to all stages of software development is beneficial. This includes both during requirements definition and test development as well as the traditional phases of software creation (architecture, design, coding).
'Every problem in computer science boils down to tradeoffs' – Professor L. J. Henschen.
Mean Time To Failure (MTTF) and Mean Time to Repair (MTTR) determine the reliability and availability of a system. These two parameters can be traded off against each other. In some contexts, MTTR is the more important attribute, especially if the system is striving for high availability. Examples include ...