Chapter 2. The Road to Resilience and Reliability

If you build and/or operate an important application, it doesn’t matter if it is large or small. The thing you care about is that it works. Under ideal circumstances, this is not very difficult. But those kinds of circumstances are a dream. In every environment there is failure. The question is how to deal with it.

This problem is not new. The traditional point of resolution used to be the IT department. Due to several factors, that is changing. Operations is more and more part of software, and building infrastructures is software engineering.

To introduce the book, we’ll first discuss our strategy. If infrastructure is software, we can apply our software engineering principles.

Once Upon a Time, There Was a Mason

One of the most important problems that software engineers have to solve is how to reuse work (code). Reusing code means you reduce the size of the code base, and as a consequence, there is less work in development and testing. Maintenance is also much more effective; multiple projects can benefit from improvements in one piece of code.

There are many solutions to this problem of how to reuse code. First, with structured programming, there were methods (functions/procedures). Later on, object-oriented programming introduced even more tools to handle this challenge, with objects, classes, and inheritance, for example.

There are also domain-specific tools and environments, often called frameworks. These frameworks offer a structure ...

Get Resilience and Reliability on AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.