The Quality Defenders' Lament

One large source of waste in development is double-checking. For example, imagine a team operating in a traditional waterfall development system, without continuous deployment, TDD, or continuous integration. When a developer wants to check in code, or an ops staff member thinks he's ready to push a release, this is a very scary moment. He has a choice: do it now, or double-check to make sure everything still works and looks good. Both options are attractive. If he proceeds now, he can claim the rewards of being done sooner. On the other hand, if he causes a problem, his previous speed will be counted against him. Why didn't he spend just another five minutes making sure he didn't cause that problem? In practice, how people respond to this dilemma is determined by their incentives, which are driven by the culture of their team. How severely is failure punished? Who will ultimately bear the cost of their mistakes? How important are schedules? Does the team value finishing early?

But the thing to notice in this situation is that there is really no right answer. People who agonize over the choice reap the worst of both worlds. As a result, people will tend toward two extremes: those who believe in getting things done as fast as possible, and those who believe that work should be carefully checked. Any intermediate position is untenable over the long term. When things go wrong any nuanced explanation of the trade-offs involved is going to sound unsatisfying. After all, you could have acted a little sooner or a little more carefully—if only you'd known what the problem was going to be in advance. Viewed through the lens of hindsight, most of those judgments look bad. On the other hand, an extreme position is much easier to defend. Both have built-in excuses: "Sure there were a few bugs, but I consistently overdeliver on an intense schedule, and it's well worth it," or "I know you wanted this done sooner, but you know I only ever deliver when it's absolutely ready and it's well worth it."

These two extreme positions lead to factional strife, which is extremely unpleasant. Managers start to make a note of who's part of which faction and then assign projects accordingly. Got a crazy last-minute feature? Get the Cowboys to take care of it—and then let the Quality Defenders clean it up in the next release. Both sides start to think of their point of view in moralistic terms: "Those guys don't see the economic value of fast action, they only care about their precious architecture diagrams," or "Those guys are sloppy and have no professional pride." Having been called upon to mediate these disagreements many times in my career, I can attest to just how wasteful they are.

However, they are completely logical outgrowths of a large-batch-size development process that forces developers to make trade-offs between time and quality, using the old "time-quality-money, pick two fallacy" (http://startuplessonslearned.com/2008/10/engineering-managers-lament.html). Because feedback is slow in coming, the damage caused by a mistake is felt long after the decisions that caused the mistake were made, making learning difficult. Because everyone gets ready to integrate with the release batch around the same time (there being no incentive to integrate early), conflicts are resolved under extreme time pressure. Features are chronically on the bubble, about to get deferred to the next release. But when they do get deferred, they tend to have their scope increased ("After all, we have a whole release cycle, and it's almost done..."), which leads to yet another time crunch, and so on. And of course, the code rarely performs in production the way it does in the testing or staging environment, which leads to a series of hotfixes immediately following each release. These come at the expense of the next release batch, meaning that each release cycle starts off behind.

You can't change the underlying incentives of this situation by getting better at any one activity. Better release planning, estimating, architecting, or integrating will only mitigate the symptoms. The only traditional technique for solving this problem is to add in massive queues in the forms of schedule padding, extra time for integration, code freezes, and the like. In fact, most organizations don't realize just how much of this padding is already going on in the estimates that individual contributors learn to generate. But padding doesn't help, because it serves to slow down the whole process. And as all development teams will tell you, time is always short. In fact, excess time pressure is exactly why they think they have these problems in the first place.

So, we need to find solutions that operate at the system level to break teams out of this pincer action. The Agile software movement has made numerous contributions: continuous integration, which helps accelerate feedback about defects; story cards and Kanban that reduce batch size; a daily stand-up that increases tempo. Continuous deployment is another such technique, one with a unique power to change development team dynamics for the better.

Why Does It Work?

First, continuous deployment separates two different definitions of the term release. One is used by engineers to refer to the process of getting code fully integrated into production. Another is used by marketing to refer to what customers see. In traditional batch-and-queue development, these two concepts are linked. All customers will see the new software as soon as it's deployed. This requires that all of the testing of the release happens before it is deployed to production, in special staging or testing environments. And this leaves the release vulnerable to unanticipated problems during this window of time: after the code is written but before it's running in production. On top of that overhead, by conflating the marketing release with the technical release, the amount of coordination overhead required to ship something is also dramatically increased.

Under continuous deployment, as soon as code is written it's on its way to production. That means we are often deploying just 1% of a feature—long before customers would want to see it. In fact, most of the work involved with a new feature is not the user-visible parts of the feature itself. Instead, it's the millions of tiny touch points that integrate the feature with all the other features that were built before. Think of the dozens of little API changes that are required when we want to pass new values through the system. These changes are generally supposed to be "side-effect free," meaning they don't affect the behavior of the system at the point of insertion—emphasis on supposed. In fact, many bugs are caused by unusual or unnoticed side effects of these deep changes. The same is true of small changes that only conflict with configuration parameters in the production environment. It's much better to get this feedback as soon as possible, which continuous deployment offers.

Continuous deployment also acts as a speed regulator. Every time the deployment process encounters a problem, a human being needs to get involved to diagnose it. During this time, it's intentionally impossible for anyone else to deploy. When teams are ready to deploy, but the process is locked, they become immediately available to help diagnose and fix the deployment problem (the alternative—that they continue to generate, but not deploy, new code—just serves to increase batch sizes to everyone's detriment). This speed regulation is a tricky adjustment for teams that are accustomed to measuring their progress via individual efficiency. In such a system, the primary goal of each engineer is to stay busy, using as close to 100% of his time for coding as possible. Unfortunately, this view ignores the team's overall throughput. Even if you don't adopt a radical definition of progress, such as the "validated learning about customers" definition (http://startuplessonslearned.com/2009/04/validated-learning-about-customers.html) that I advocate, it's still suboptimal to keep everyone busy. When you're in the midst of integration problems, any code that someone is writing is likely to have to be revised as a result of conflicts. The same is true with configuration mismatches or multiple teams stepping on one other's toes. In such circumstances, it's much better for overall productivity for people to stop coding and start talking. Once they figure out how to coordinate their actions so that the work they are doing doesn't have to be reworked, it's productive to start coding again.

Returning to our development team divided into Cowboy and Quality factions, let's take a look at how continuous deployment can change the calculus of their situation. For one, continuous deployment fosters learning and professional development—on both sides of the divide. Instead of having to argue with each other about the right way to code, each individual has an opportunity to learn directly from the production environment. This is the meaning of the axiom to "let your defects be your teacher."

If an engineer has a tendency to ship too soon, he will tend to find himself grappling with the cluster immune system (http://startuplessonslearned.com/2008/09/just-in-time-scalability.html), continuous integration server, and Five Whys master more often. These encounters, far from being the high-stakes arguments inherent in traditional teams, are actually low-risk, mostly private or small-group affairs. Because the feedback is rapid, Cowboys will start to learn what kinds of testing, preparation, and checking really do let them work faster. They'll be learning the key truth that there is such a thing as "too fast"—many quality problems actually slow you down.

Engineers who have a tendency to wait too long before shipping also have lessons to learn. For one, the larger the batch size of their work, the harder it will be to get it integrated. At IMVU, we would occasionally hire someone from a more traditional organization who had a hard time letting go of his "best practices" and habits. Sometimes he'd advocate for doing his work on a separate branch and integrating only at the end. Although I'd always do my best to convince such people otherwise, if they were insistent I would encourage them to give it a try. Inevitably, a week or two later I'd enjoy the spectacle of watching them engage in something I called "code bouncing." It's like throwing a rubber ball against a wall. In a code bounce, someone tries to check in a huge batch. First he has integration conflicts, which requires talking to various people on the team to know how to resolve them properly. Of course, while he is resolving the conflicts, new changes are being checked in. So, new conflicts appear. This cycle repeats for a while, until he either catches up to all the conflicts or just asks the rest of the team for a general check-in freeze. Then the fun part begins. Getting a large batch through the continuous integration server, incremental deploy system, and real-time monitoring system almost never works on the first try. Thus, the large batch gets reverted. While the problems are being fixed, more changes are being checked in. Unless we freeze the work of the whole team, this can go on for days. But if we do engage in a general check-in freeze, we're driving up the batch size of everyone else—which will lead to future episodes of code bouncing. In my experience, just one or two episodes is enough to cure anyone of his desire to work in large batches.

Because continuous deployment encourages learning, teams that practice it are able to get faster over time. That's because each individual's incentives are aligned with the goals of the whole team. Each person works to drive down waste in his own work, and this true efficiency gain more than offsets the incremental overhead of having to build and maintain the infrastructure required to do continuous deployment. In fact, if you practice Five Whys too, you can build this entire infrastructure in a completely incremental fashion. It's really a lot of fun.

Get Web Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.