Preface

DESIGNING, BUILDING, AND MAINTAINING A GROWING WEBSITE has unique challenges when it comes to the fields of systems administration and software development. For one, the Web never sleeps. Because websites are globally used, there is no "good" time for changes, upgrades, or maintenance windows, only fewer "bad" times. This also means that outages are guaranteed to affect someone, somewhere using the site, no matter what time it is.

As web applications become an increasing part of our daily lives, they are also becoming more complex. With that complexity comes more parts to build and maintain and, unfortunately, more parts to fail. On top of that, there are requirements for being fast, secure, and always available across the planet. All these things add up to what's become a specialized field of engineering: web operations.

This book was conceived to gather insights into this still-evolving field from web veterans around the industry. Jesse Robbins and I came up with a list of tip-of-iceberg topics and asked these experts for their hard-earned advice and stories from the trenches.

How This Book Is Organized

The chapters in this book are organized as follows:

Chapter 1 by Theo Schlossnagle, describes what this field actually encompasses and underscores how the skills needed are gained by experience and less about formal education.

Chapter 2 by Justin Huff, explains how Picnik.com went about deploying and sustaining its infrastructure on a mix of on-premise hardware and cloud services.

Chapter 3 by Matt Massie and myself, discusses the importance of gathering metrics from both your application and your infrastructure, and considerations on how to gather them.

Chapter 4 by Eric Ries, gives his take on the advantages of deploying code to production in small batches, frequently.

Chapter 5 by Adam Jacob, gives an overview about the theory and approaches for configuration and deployment management.

Chapter 6 by Patrick Debois, discusses the various considerations when designing a monitoring system.

Chapter 7, is Dr. Richard Cook's whitepaper on systems failure and the nature of complexity that is often found in web architectures. He also adds some web operations–specific notes to his original paper.

Chapter 8, is my interview with Heather Champ on the topic of how outages and degradations should be handled on the human side of things.

Chapter 9 by Brian Moon, talks about the experiences with huge traffic deluges at Dealnews.com and what they did to mitigate disaster.

Chapter 10 by Paul Hammond, lists some of the places where development and operations can come together to enable the business, both technically and culturally.

Chapter 11 by Alistair Croll and Sean Power, discusses metrics that can be used to illustrate what the real experience of your site is.

Chapter 12 by Baron Schwartz, lays out common approaches to database architectures and some pitfalls that come with increasing scale.

Chapter 13 by Jake Loomis, goes into what makes or breaks a good postmortem and root cause analysis process.

Chapter 14 by Anoop Nagwani, explores the gamut of approaches and considerations when designing and maintaining storage for a growing web application.

Chapter 15 by Eric Florenzano, lists considerations and advantages of using a growing number of "nonrelational" database technologies.

Chapter 16 by Andrew Clay Shafer, discusses the human and process sides of operations, and how agile philosophy and methods map (or not) to the operational space.

Chapter 17 by Mike Christian, takes you through the various levels of availability and Business Continuity Planning (BCP) approaches and dangers.

Get Web Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.