Chapter 4. Behind the Scenes

Unlike many revision control systems, the concepts upon which Mercurial is built are simple enough that it’s easy to understand how the software really works. Knowing these details certainly isn’t necessary, so it is safe to skip this chapter. However, I think you will get more out of the software with a mental model of what’s going on.

Being able to understand what’s going on behind the scenes gives me confidence that Mercurial has been carefully designed to be both safe and efficient. And just as importantly, if it’s easy for me to retain a good idea of what the software is doing when I perform a revision control task, I’m less likely to be surprised by its behavior.

In this chapter, we’ll initially cover the core concepts behind Mercurial’s design, then continue to discuss some of the interesting details of its implementation.

Mercurial’s Historical Record

Tracking the History of a Single File

When Mercurial tracks modifications to a file, it stores the history of that file in a metadata object called a filelog. Each entry in the filelog contains enough information to reconstruct one revision of the file that is being tracked. Filelogs are stored as files in the .hg/store/data directory. A filelog contains two kinds of information: revision data, and an index to help Mercurial find a revision efficiently.

A file that is large, or has a lot of history, has its filelog stored in separate data (.d suffix) and index (.i suffix) files. For small files without much history, the revision data and index are combined in a single .i file. The correspondence between a file in the working directory and the filelog that tracks its history in the repository is illustrated in Figure 4-1.

Relationships between files in working directory and filelogs in repository

Figure 4-1. Relationships between files in working directory and filelogs in repository

Managing Tracked Files

Mercurial uses a structure called a manifest to collect together information about the files that it tracks. Each entry in the manifest contains information about the files present in a single changeset. An entry records which files are present in the changeset, the revision of each file, and a few other pieces of file metadata.

Recording Changeset Information

The changelog contains information about each changeset. Each revision records who committed a change, the changeset comment, other pieces of changeset-related information, and the revision of the manifest to use.

Relationships Between Revisions

Within a changelog, a manifest, or a filelog, each revision stores a pointer to its immediate parent (or to its two parents, if it’s a merge revision). As I mentioned above, there are also relationships between revisions across these structures, and they are hierarchical in nature.

For every changeset in a repository, there is exactly one revision stored in the changelog. Each revision of the changelog contains a pointer to a single revision of the manifest. A revision of the manifest stores a pointer to a single revision of each filelog tracked when that changeset was created. These relationships are illustrated in Figure 4-2.

Metadata relationships

Figure 4-2. Metadata relationships

As the illustration shows, there is not a one to one relationship between revisions in the changelog, manifest, or filelog. If a file that Mercurial tracks hasn’t changed between two changesets, the entry for that file in the two revisions of the manifest will point to the same revision of its filelog.[3]



[3] It is possible (though unusual) for the manifest to remain the same between two changesets, in which case the changelog entries for those changesets will point to the same revision of the manifest.

Get Mercurial: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.