Introduction

An Imperfect World

We live in an imperfect world. The things we make break when we least expect them to. This includes computer programs and the systems that we build from computers. Even the things that we think of as being the most reliable are occasionally unavailable because they've broken. This book is about how to make these systems of software (and hardware) work even though they might break occasionally.

Consider the United States' manned space flight program: Apollo 13 had a dramatic failure that almost killed the three person crew as they were heading toward the moon. Think also of the failures that are detected during space shuttle assembly that delay the launch for a period of days. These space systems, which are highly complicated systems of hardware and software components, were designed to operate flawlessly, and yet failures happened.

Consider also the WYSIWYG document editing program that just won't let you number the first page of a document, such as this book manuscript, page one. Page numbering is a feature that is expected by the program's creators and users to work flawlessly.

Or consider systems such as telephone switching equipment or web-based e-commerce systems or automatic teller machines (ATMs). These are expected to work flawlessly and continuously. They are built of combinations of hardware and software components that work together to provide the desired service.

This book is about what to design into software to make these complicated systems ...

Get Patterns for Fault Tolerant Software now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.