A Few Advantages of Distributed Revision Control

Even though distributed revision control tools have for several years been as robust and usable as their previous-generation counterparts, people using older tools have not yet necessarily woken up to their advantages. There are a number of ways in which distributed tools shine relative to centralized ones.

For an individual developer, distributed tools are almost always much faster than centralized tools. This is for a simple reason: a centralized tool needs to talk over the network for many common operations, because most metadata is stored in a single copy on the central server. A distributed tool stores all of its metadata locally. All else being equal, talking over the network adds overhead to a centralized tool. Don’t underestimate the value of a snappy, responsive tool: you’re going to spend a lot of time interacting with your revision control software.

Distributed tools are indifferent to the vagaries of your server infrastructure, again because they replicate metadata to so many locations. If you use a centralized system and your server catches fire, you’d better hope that your backup media are reliable, and that your last backup was recent and actually worked. With a distributed tool, you have many backups available on every contributor’s computer.

The reliability of your network will affect distributed tools far less than it will centralized tools. You can’t even use a centralized tool without a network connection, except for a few highly constrained commands. With a distributed tool, if your network connection goes down while you’re working, you may not even notice. The only thing you won’t be able to do is talk to repositories on other computers, something that is relatively rare compared with local operations. If you have a far-flung team of collaborators, this may be significant.

Advantages for Open Source Projects

If you take a shine to an open source project and decide that you would like to start hacking on it, and that project uses a distributed revision control tool, you are at once a peer with the people who consider themselves the core of that project. If they publish their repositories, you can immediately copy their project history, start making changes, and record your work, using the same tools in the same ways as insiders. By contrast, with a centralized tool, you must use the software in a read-only mode unless someone grants you permission to commit changes to their central server. Until then, you won’t be able to record changes, and your local modifications will be at risk of corruption any time you try to update your client’s view of the repository.

The forking non-problem

It has been suggested that distributed revision control tools pose some sort of risk to open source projects because they make it easy to fork the development of a project. A fork happens when there are differences in opinion or attitude between groups of developers that cause them to decide that they can’t work together any longer. Each side takes a more or less complete copy of the project’s source code, and goes off in its own direction.

Sometimes the camps in a fork decide to reconcile their differences. With a centralized revision control system, the technical process of reconciliation is painful, and has to be performed largely by hand. You have to decide whose revision history is going to win, and graft the other team’s changes into the tree somehow. This usually loses some or all of one side’s revision history.

What distributed tools do with respect to forking is they make forking the only way to develop a project. Every single change that you make is potentially a fork point. The great strength of this approach is that a distributed revision control tool has to be really good at merging forks, because forks are absolutely fundamental: they happen all the time.

If every piece of work that everybody does, all the time, is framed in terms of forking and merging, then what the open source world refers to as a fork becomes purely a social issue. If anything, distributed tools lower the likelihood of a fork:

  • They eliminate the social distinction that centralized tools impose: that between insiders (people with commit access) and outsiders (people without).

  • They make it easier to reconcile after a social fork, because all that’s involved from the perspective of the revision control software is just another merge.

Some people resist distributed tools because they want to retain tight control over their projects, and they believe that centralized tools give them this control. However, if you’re of this belief, and you publish your CVS or Subversion repositories publicly, there are plenty of tools available that can pull out your entire project’s history (albeit slowly) and recreate it somewhere that you don’t control. So while your control in this case is illusory, you are forgoing the ability to fluidly collaborate with whatever people feel compelled to mirror and fork your history.

Advantages for Commercial Projects

Many commercial projects are undertaken by teams that are scattered across the globe. Contributors who are far from a central server will see slower command execution and perhaps less reliability. Commercial revision control systems attempt to ameliorate these problems with remote-site replication add-ons that are typically expensive to buy and cantankerous to administer. A distributed system doesn’t suffer from these problems in the first place. Better yet, you can easily set up multiple authoritative servers, say one per site, so that there’s no redundant communication between repositories over expensive long-haul network links.

Centralized revision control systems tend to have relatively low scalability. It’s not unusual for an expensive centralized system to fall over under the combined load of just a few dozen concurrent users. Once again, the typical response tends to be an expensive and clunky replication facility. Since the load on a central server—if you have one at all—is many times lower with a distributed tool (because all of the data is replicated everywhere), a single cheap server can handle the needs of a much larger team, and replication to balance load becomes a simple matter of scripting.

If you have an employee in the field, troubleshooting a problem at a customer’s site, they’ll benefit from distributed revision control. The tool will let them generate custom builds, try different fixes in isolation from each other, and search efficiently through history for the sources of bugs and regressions in the customer’s environment, all without needing to connect to your company’s network.

Get Mercurial: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.