Thinking in terms of the cloud, and not infrastructure

We will now describe a real incident that took place in a datacenter in late December, 2011, when dozens of alerts were received from our live monitoring system. This was a result of losing connectivity to the datacenter. In response to this, administrator rushed to the Network Operations Center (NOC), hoping that it was only a small glitch in the monitoring system. With so much redundancy, we may wonder how everything can go offline. Unfortunately, the big monitoring screens in the NOC room were all red, which is not a good sign. This was the beginning of a very long nightmare.

As it happens, this was caused by an electrician who was working in the datacenter and mistakenly triggered ...

Get Effective DevOps with AWS - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.