Mercurial Compared with Other Tools

Before you read on, please understand that this section necessarily reflects my own experiences, interests, and (dare I say it) biases. I have used every one of the revision control tools listed below, in most cases for several years at a time.

Subversion

Subversion is a popular revision control tool, developed to replace CVS. It has a centralized client/server architecture.

Subversion and Mercurial have similarly named commands for performing the same operations, so if you’re familiar with one, it is easy to learn to use the other. Both tools are portable to all popular operating systems.

Prior to version 1.5, Subversion had no useful support for merges. At the time of writing, its merge tracking capability is new, and known to be complicated and buggy.

Mercurial has a substantial performance advantage over Subversion on every revision control operation I have benchmarked. I have measured its advantage as ranging from a factor of two to a factor of six when compared with Subversion 1.4.3’s ra_local file store, which is the fastest access method available. In more realistic deployments involving a network-based store, Subversion will be at a substantially larger disadvantage. Because many Subversion commands must talk to the server and Subversion does not have useful replication facilities, server capacity and network bandwidth become bottlenecks for modestly large projects.

Additionally, Subversion incurs substantial storage overhead to avoid network transactions for a few common operations, such as finding modified files (status) and displaying modifications against the current revision (diff). As a result, a Subversion working copy is often the same size as, or larger than, a Mercurial repository and working directory, even though the Mercurial repository contains a complete history of the project.

Subversion is widely supported by third-party tools. Mercurial currently lags considerably in this area. This gap is closing, however, and indeed some of Mercurial’s GUI tools now outshine their Subversion equivalents. Like Mercurial, Subversion has an excellent user manual.

Because Subversion doesn’t store revision history on the client, it is well suited to managing projects that deal with lots of large, opaque binary files. If you check in fifty revisions to an incompressible 10MB file, Subversion’s client-side space usage stays constant. The space used by any distributed SCM will grow rapidly in proportion to the number of revisions, because the differences between each revision are large.

In addition, it’s often difficult (or more usually, impossible) to merge different versions of a binary file. Subversion’s ability to let a user lock a file, so that they temporarily have the exclusive right to commit changes to it, can be a significant advantage to a project where binary files are widely used.

Mercurial can import revision history from a Subversion repository. It can also export revision history to a Subversion repository. This makes it easy to test the waters and use Mercurial and Subversion in parallel before deciding to switch. History conversion is incremental, so you can perform an initial conversion, then small additional conversions afterwards to bring in new changes.

Git

Git is a distributed revision control tool that was developed for managing the Linux kernel source tree. Like Mercurial, its early design was somewhat influenced by Monotone (described at the end of this chapter).

Git has a very large command set, with version 1.5.0 providing 139 individual commands. It has something of a reputation for being difficult to learn. Compared to Git, Mercurial has a strong focus on simplicity.

In terms of performance, Git is extremely fast. In several cases, it is faster than Mercurial, at least on Linux, while Mercurial performs better on other operations. However, on Windows, the performance and general level of support that Git provides is, at the time of writing, far behind that of Mercurial.

While a Mercurial repository needs no maintenance, a Git repository requires frequent manual repacks of its metadata. Without these, performance degrades, while space usage grows rapidly. A server that contains many Git repositories that are not rigorously and frequently repacked will become heavily disk-bound during backups, and there have been instances of daily backups taking far longer than 24 hours as a result. A freshly packed Git repository is slightly smaller than a Mercurial repository, but an unpacked repository is several orders of magnitude larger.

The core of Git is written in C. Many Git commands are implemented as shell or Perl scripts, and the quality of these scripts varies widely. I have encountered several instances where scripts charged along blindly in the presence of errors that should have been fatal.

Mercurial can import revision history from a Git repository.

CVS

CVS is probably the most widely used revision control tool in the world. Due to its age and internal untidiness, it has been only lightly maintained for many years.

It has a centralized client/server architecture. It does not group related file changes into atomic commits, making it easy for people to break the build: one person can successfully commit part of a change and then be blocked by the need for a merge, causing other people to see only a portion of the work they intended to do. This also affects how you work with project history. If you want to see all of the modifications someone made as part of a task, you will need to manually inspect the descriptions and timestamps of the changes made to each file involved (if you even know what those files were).

CVS has a muddled notion of tags and branches that I will not attempt to even describe. It does not support renaming of files or directories well, making it easy to corrupt a repository. It has almost no internal consistency checking capabilities, so it is usually not even possible to tell whether or how a repository is corrupt. I would not recommend CVS for any project, existing or new.

Mercurial can import CVS revision history. However, there are a few caveats that apply; these are true of every other revision control tool’s CVS importer, too. Due to CVS’s lack of atomic changes and unversioned filesystem hierarchy, it is not possible to reconstruct CVS history completely accurately; some guesswork is involved, and renames will usually not show up. Because a lot of advanced CVS administration has to be done by hand and is hence error-prone, it’s common for CVS importers to run into multiple problems with corrupted repositories (completely bogus revision timestamps and files that have remained locked for over a decade are just two of the less interesting problems I can recall from personal experience).

Mercurial can import revision history from a CVS repository.

Commercial Tools

Perforce has a centralized client/server architecture, with no client-side caching of any data. Unlike modern revision control tools, Perforce requires that a user run a command to inform the server about every file they intend to edit.

The performance of Perforce is quite good for small teams, but it falls off rapidly as the number of users grows beyond a few dozen. Modestly large Perforce installations require the deployment of proxies to cope with the load their users generate.

Choosing a Revision Control Tool

With the exception of CVS, all of the tools listed above have unique strengths that suit them to particular styles of work. There is no single revision control tool that is best in all situations.

As an example, Subversion is a good choice for working with frequently edited binary files, due to its centralized nature and support for file locking.

I personally find Mercurial’s properties of simplicity, performance, and good merge support to be a compelling combination that has served me well for several years.

Get Mercurial: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.