Conclusion

This chapter covered many techniques and concepts that you can use to improve the development–operations relationship. The most useful and important is the understanding that site stability is everyone's responsibility and not just something that should be relegated to the operations team to handle on its own.

This is the case at Flickr. The team has many convictions about the best way to support and improve the product. The most deeply ingrained is that site uptime is more important than everything except the security and privacy of our members' data. This isn't true just within engineering—every team member, from design and product to ad sales and business development, is willing to drop what she's doing (even after hours) if there's something she can do to help keep the site up and running.

Thinking in this way removes the pressure to compromise on stability in the interests of some other goal. It leads to features that have been designed and built at every level to gracefully handle partial outages and provide switches and information that will be useful when things go wrong. It means that when things do break significantly, the response is coordinated and well thought through.

But if we move responsibility for site stability to someone else, where does that leave operations? Does this mean there's no role for the traditional operations team?

Even with the changes we've discussed, the members of the operations team remain the experts on site uptime. They're probably the ...

Get Web Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.