Failover Testing

BCP plans are just that. They are plans. To turn your failover plans into reality, you need to actually execute them. A full peak-traffic failover can be a daunting task. So daunting, in fact, that I have seen many websites take hours of downtime, hoping the affected datacenter would magically come back, so that they don't have to go through a failover process that might or might not work. It's much better to test your procedures when you're not in an emergency situation so that if something goes wrong, you can fall back. By testing early and often, you'll gain the comfort and experience to swiftly do the right thing when disaster strikes.

Not only is regular failover testing an essential part of disaster preparedness, but it can also be a valuable tool in the day-to-day operations of your website. How many times have you pushed a new code release, only to find it breaks the site? A much better release mechanism is to fail traffic out of one datacenter, upgrade the software in the cold site, QA it appropriately, and only then fail traffic back in. Repeat with your other datacenter. This way, if something goes wrong during the upgrade, your users won't be affected. It also makes sense to implement failover to avoid risk because of standard maintenance. Need to swap out one of your redundant routers? Fail out of the datacenter, in case the other router goes bad. Testing your UPS? Fail out, lest things go wrong. The more you use your BCP plan, the better it will become. ...

Get Web Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.