Chapter 10. Editing History

In this chapter, we cover various techniques for editing repository history. Earlier chapters covered simple cases of this, focused on correcting individual commits; here, we’re concerned with larger changes: moving branches, merging or splitting repositories, systematically altering an entire history, and so on.

The caution given earlier bears repeating here: you should not generally use any of these techniques on history that has already been published to other people! It will break their ability to use the push/pull mechanism, which may be very difficult and awkward to recover from. Only use these on private repositories, or if you can coordinate the change with everyone involved. It’s easiest if all users of a shared repository commit and push all their outstanding changes, then simply reclone it after you make your edits. Or, they can use git rebase instead as we are about to describe, if they’re a bit more adventurous.

Rebasing

We have already covered special cases of rebasing, especially for editing a sequence of commits at a branch tip; here, we consider the general case. The general purpose of git rebase is to move a branch from one location to another. Since commits are immutable, they can’t actually be moved (their parent commits would change), so this entails making new commits with the same changesets and metadata: author, committer, timestamps, and so on. The steps Git follows during a rebase are as follows:

  1. Identify the commits to be moved (more accuratedly, replicated).
  2. Compute the corresponding changesets (patches).
  3. Move HEAD to the new branch location (base).
  4. Apply the changesets in order, making new commits preserving author information.
  5. Finally, update the branch ref to point to the new tip commit.

The process of making new commits with the same changesets as existing ones is called “replaying” those commits. Step 4 can be modified with an “interactive rebase” (git rebase --interactive (-i)), allowing you to edit the commits in various ways as you move them; see the earlier discussion of this feature (Editing a Series of Commits).

The most general form of the command is:

$ git rebase [--onto newbase] [upstream] [branch]

which means to replay the commit set upstream..branch starting at newbase. The defaults are:

upstream: HEAD@{upstream}
The upstream of the current branch, if any
branch:
HEAD
newbase:
The upstream argument, whatever its default or user-supplied value is

For example, given the commit graph in Figure 10-1, the command git rebase --onto C master topic would move the topic branch as shown in Figure 10-2.

Before rebasing
Figure 10-1. Before rebasing
After rebasing
Figure 10-2. After rebasing

1′, 2′, and 3′ are new commits replicating the changesets of commits 1, 2, and 3. Calling B the “base” of the original (unmerged portion of) the topic branch, this changes the base from B to C, thus “rebasing” the branch.

The behavior of the default arguments to git rebase reveals the simplest use of rebasing: keeping a sequence of local commits at the tip of a branch as the upstream progresses, rather than performing a merge. After doing a git fetch, you see that your local master branch has diverged from its upstream counterpart (see Figure 10-3).

Before rebasing
Figure 10-3. Before rebasing

Following the preceding defaults, the simple command:

$ git rebase

actually means:

$ git rebase --onto origin/master origin/master master

which in turn means to replay the commit set origin/master..master at origin/master, resulting in the change shown in Figure 10-4.

After rebasing
Figure 10-4. After rebasing

Your local commits 1, 2, and 3 have been shifted forward to remain based off the tip of the upstream master. This is such a common use of rebasing for certain workflows that there is a --rebase option to git pull that runs git rebase in between its fetch and merge steps (see Pull with Rebase). In this case, the final step merging origin/master into master will do nothing, since if the rebase is successful, the upstream branch origin/master is now already contained in the local master.

Undoing a Rebase

The final step of a successful rebase is to repoint the ref of the branch being moved, from the old tip commit to the new one. The original commits are not immediately expunged in any way, but merely abandoned: they are no longer reachable in the commit graph from any branch, and will eventually be garbage collected by Git if they remain that way for some time. To undo the rebase operation, then, all you need to do is move the branch ref back to its original spot, which you can discover using the reflog. After the git rebase, for example, your reflog would look something like this:

$ git log -g
b61101ac HEAD@{0}: rebase finished: returning to
refs/heads/master
b61101ac HEAD@{1}: rebase: 3
6f554c9a HEAD@{2}: rebase: 2
cb7496ab HEAD@{3}: rebase: 1
baa5d906 HEAD@{4}: checkout: moving from master to
baa5d906...
e3a1d5b0 HEAD@{5}: commit: 3

The checkout step is the beginning of the rebase as Git moves HEAD to the new base, the tip of the upstream origin/master (here at commit baa5d906). The rebase steps replay commits 1, 2, and 3 at the new location, and in the final step, the local master branch (full ref name refs/heads/master) is reset to the new tip commit. In the earliest reflog entry, you can see when you made your original version of commit 3, with commit ID e3a1d5b0. To return to that state, all you need to do is:

$ git reset --hard e3a1d5b0

The original tip commit might not be at the same spot as shown here, since that depends on the exact sequence of commands you used, but it will show up somewhere earlier in the reflog.

Importing from One Repository to Another

Suppose you would like to combine two repositories—say, to import the entire content of repository B as a subdirectory b of repository A. You could just copy the working tree of B into A and then add and commit it, of course, but you want to retain the history of repository B as well as the content. Though that’s easy to say, it’s not immediately clear what this means. The Git history of each repository consists of an entire graph of individual content snapshots, branching and merging in possibly complex ways over time, and there are different ways in which you might want to combine the two. In this section, we discuss a few of them.

Importing Disconnected History

The simplest way to combine two repositories is simply to import the whole commit graph of one into the other, without connecting them in any way. Ordinarily, a repository has a single “root commit,” that is, a commit with no parents—the first commit created after the repository was initialized, of which all other commits are descendants. However, there is nothing preventing you from having multiple root commits in a single repository, in which case the commit graph consists of multiple disconnected regions; in Figure 10-5, commits A and 1 are both root commits.

Multiple root commits
Figure 10-5. Multiple root commits

Here’s how to import repository B into repository A in this fashion:

$ cd A
$ git remote add B URL
$ git fetch B
warning: no common commits
...
 [new branch]        master     -> B/master
 [new branch]        zorro      -> B/zorro
$ git for-each-ref --shell \
   --format='git branch --no-track %(refname:short)
   %(refname:short)' \
   'refs/remotes/B/*' | sed -e 's:/:-:' | sh -x 
$ git branch
 B-master
 B-zorro
 master
$ git remote rm B

This recipe uses git for-each-ref, a versatile tool for generating scripts on the fly that apply to a given set of refs (in this case, branches). For each named ref, here selected by the pattern refs/remotes/B/* (all tracking branches for the remote B), it generates a separate git branch command substituting the names marked with %(…) to reflect that ref. This series of commands then goes through the Unix command sed rewriting B/foo to B-foo, and finally, the commands are run by feeding them into the Bourne shell (sh). (See git-for-each-ref(1) for more detail on this handy command for Git automation.)

The case of pulling in a completely disconnected commit graph is sufficiently unusual that Git warns you about it, saying that the repository you’re fetching has “no common commits” with this one.

After the fetch command, Git has copied the entire commit graph of B into the object database of A, but the only references in A to the new branches are remote-tracking ones: B/master and B/zorro. To finish incorporating B into A, we need to make local branches for these. The git for-each-ref incantation prepares and runs a set of Git commands that create a local branch named B-x for each remote branch B/x, by running git branch --no-track B-x B/x. The --no-track option avoids creating unnecessary tracking relationships that would just be removed later. We prefix the new branch names with B-, since there may be clashes (as here, where there is a master branch in each repository). Finally, when done, we remove the remote B, since we do not intend to continue tracking the other repository; the remote was just a mechanism to perform the import.

This demonstrates a general way of doing it; you can of course just run the appropriate git branch commands yourself to name the new branches as you wish, if there are few of them, or rename them afterward with git branch -m old new.

Although this is the easiest method of combining two histories, it is also not usually what you want to do, because you can’t use the Git merge mechanism on branches that were originally part of distinct histories. This is because git merge looks for a “merge base”: a common ancestor commit of the branches being merged—and in this case, there is no such commit. You might use this technique if you have rearranged the history of a repository, but want to keep the original history around for reference, and it’s more convenient to have it both in one repository than split over two.

Importing Linear History

To import history so that it is connected to the commit graph of the receiving repository, you can’t just use the existing commits of the donor repository as before, since you need new parent commit pointers to connect the two histories and you can’t actually change commits. Instead, you must make new commits introducing the same content (as with git rebase). If the history you want to import is linear—either the entire repository, or the branch you’re interested in—then you can use git format-patch and git am to do this easily (these commands are described more fully in Patches with Commit Information). Here’s a formula for adding the complete history of branch foo in repository B to the current branch in repository A:

$ cd A
$ git --git-dir /path/to/B/.git format-patch --root --stdout foo | git am

This formats the commits on branch foo in repository B as a series of patches with accompanying metadata (author, committer, timestamps, and so on), and feeds that into git am, which applies the patches as new commits in repository A. Note that you can’t refer to B here directly as a remote repository with a URL; you need a local copy to use, for which you can just clone B and check out the branch you want to import.

Because you are now applying patches rather than importing commits whole, you might encounter conflicts if the source and destination repositories have overlapping content (the same filenames). To avoid this, you can tell git am to prepend a directory to all filenames with --directory, thus depositing the files in the imported history inside a new directory. Combined with the -pn option, which first removes n leading directories from those filenames, and with limiting the source files via an argument to git format-patch, you can import a particular directory or other subset of files into a new directory without conflicts. Extending the preceding example:

$ cd A
$ git --git-dir /path/to/B/.git format-patch --root --stdout foo -- src | git am -p2 --directory dst

This imports the history on the branch foo in repository B, limited to files in the directory src, and places those files in directory dst instead in repository A.

Without --root, just giving a rev foo means foo..HEAD: the recent commits on the current branch that are not in the history of foo. You can also give a range expression of your own to specify the commits to include (e.g., 9ec0eafb..master).

Warning

If the source branch history is not linear (contains merge commits), git format-patch won’t complain; it will just produce patches for all the nonmerge commits. This is likely to cause conflicts; see the next section.

Importing Nonlinear History

Because the git format-patch/git am technique works only on a linear source history, here is a recipe for importing a branch with a nonlinear history, using git rebase instead. You can use the present procedure on linear history as well, if you find the previous one too slow or unwieldy (as it might be; it’s simpler, but it’s not what those commands are really intended to do).

The following example adds the history of the branch isis in a remote repository to the tip of the current branch in this one (here, the master branch):

# Add the source repository as a temporary remote
# named “temp”.
$ git remote add temp URL
# Fetch the branch “isis” from the remote.
$ git fetch temp isis
...
* branch              isis      -> FETCH_HEAD
# Make a local branch named “import” for the remote
# branch we want to bring in.
$ git branch import FETCH_HEAD
# Replay the “import” branch commits on the current
# branch, preserving merges.
$ git rebase --preserve-merges --root --onto HEAD import
...
Successfully rebased and updated refs/heads/import.
# Finally, fast-forward the local branch (master) to
# its new tip (where “import” is now), and remove the
# temporary branch and remote.
$ git checkout master
Switched to branch 'master'
$ git merge import
Updating dffbfac7..6193cf87
Fast-forward
...
$ git branch -d import
Deleted branch import (was 6193cf87).
$ git remote rm temp

This technique copies the source branch into the current repository under a temporary name, uses git rebase to graft it onto the tip of the current branch, then moves the local branch up to its new tip and deletes the temporary import branch.

Unfortunately, git rebase lacks the capabilities provided by the various arguments and options to git format-patch and git am shown earlier, which let you relocate files as you import to avoid pathname conflicts. To get the same result, you’ll need to clone the source repository and rearrange it first before importing from it. The section The Big Hammer: git filter-branch shows how to do this.

Commit Surgery: git replace

Sometimes, you really just need to replace a single commit—but it’s buried in the middle of a complex history with multiple branches that would be difficult to rewrite using git rebase -i. For example, suppose you accidentally used the wrong committer name at one point, perhaps because you had GIT_COMMITTER_NAME set and forgot to change it for this repository with git config user.name:

$ git log --format='%h %an'
...
0922daf4 Richard E. Silverman
6426690c Richard E. Silverman
03f482d6 Bozo the Clown
27e9535f Richard E. Silverman
78d481d3 Richard E. Silverman
...

Git has a command, git replace, which allows you to perform “commit surgery” by replacing any commit with a different one, without disturbing the commits around it. Now, your first instinct at this point should be to say, “That’s impossible”; we’ve explained before that because commits point to their parents, it’s impossible to alter a commit that has children without recursively altering all commits after that point in the history as well. That’s still true, and git replace is actually a trick, as we’ll see.

To fix commit 03f482d6, we first check it out and amend it with the correct author name, creating the new commit we want to use as a replacement:

$ git checkout 03f482d6
Note: checking out '03f482d6'.

You are in 'detached HEAD' state...
$ git commit --amend --reset-author -C HEAD
[detached HEAD 42627abe] add big red nose
...

Now we have a new commit, 42627abe, which has the same content and parents as the faulty commit (and now, the correct author name). It is sitting off to the side on the commit graph shown in Figure 10-6.

Replacing a commit
Figure 10-6. Replacing a commit

And now we just need to get commit 6426690c to “believe” that it has 42627abe as a parent instead of 03f482d6. The magic command is:

$ git replace 03f482d6 42627abe

And now, after returning to the original location (say, master), we see this:

$ git log --format='%h %an'
...
0922daf4 Richard E. Silverman
6426690c Richard E. Silverman
03f482d6 Richard E. Silverman
27e9535f Richard E. Silverman
78d481d3 Richard E. Silverman
...

This is, quite simply, a lie. This log claims that commit 03f482d6 now has a different author but the same commit ID, which is effectively impossible. What has happened is that git replace manages a list of object replacements, recorded in the namespace refs/replace; the name of a ref there is the ID of an object to replace, and its referent is the ID of the replacement:

$ git show-ref | grep refs/replace
42627abe6d4b1e19cb55… refs/replace/03f482d654930f7aa1…

While Git operates, whenever it retrieves the contents of an object in the object database it checks the replacement list first, and silently substitutes the contents of the replacement object, if any. Thus, in the preceding example, Git still displays the original commit ID, but shows the corrected author (which is in the content of the replacement commit).

Keeping It Real

The replacement list is an artifact of your repository; it alters your view of the commit graph, but not the graph itself. If you were to clone this repository or push to another one, the replacement would not be visible there. To make it “real,” we have to actually rewrite all the subsequent commits, which you can do thus:

$ git filter-branch -- --all

but see the next section on git filter-branch for more on that command.

The usual workflow with this feature, then, is as follows:

  1. Use git replace to make the commit graph appear as you want it.
  2. Use git filter-branch to reify the change.
  3. Push the changes elsewhere, if necessary.

You probably don’t want to push while you have replacements in force, since you don’t really know what you’re pushing!

Note

There is an older Git feature called “grafts” that does something similar; you edit the file .git/info/grafts to contain directives explicitly altering the parent list for given commits. It’s harder to use, though, and is supplanted by git replace.

Warning

git replace affects only the commit you replace; even when you apply the replacement with git filter-branch, changes you make to the content (tree) of a commit do not ripple through to child commits. For example, suppose in the amended commit earlier you had deleted a file as well as fixed the author name. You might expect that this would cause that file to disappear from the entire history starting at that point, but it would not; instead, it would simply reappear in subsequent commits, since you did not change the trees of those commits. Use git rebase -i to effect such changes.

The Big Hammer: git filter-branch

git filter-branch is the most general tool for altering the history of a repository. It walks the portion of the commit graph you specify (by default, the current branch), applying various filters you supply and rewriting commits as necessary. You can use it to make wholesale programmatic alterations to the entire history. Since this is an advanced command, we will just sketch its operation and refer the reader to git-filter-branch(1) for more detail.

You can apply the following filters, whose string arguments are passed to the shell. When they run, the environment contains the following variables reflecting the commit being rewritten:

  • GIT_COMMIT (commit ID)
  • GIT_AUTHOR_NAME
  • GIT_AUTHOR_EMAIL
  • GIT_AUTHOR_DATE
  • GIT_COMMITTER_NAME
  • GIT_COMMITTER_EMAIL
  • GIT_COMMITTER_DATE

The filters are:

--env-filter
Modifies the environment in which the commit will happen (e.g., you can change the author name by setting and exporting GIT_AUTHOR_NAME).
--tree-filter
Modifies commit contents by altering the working tree. Git treats the resulting tree as if you had run git add -Af, reflecting all new and deleted files while ignoring the usual “ignore” rules in .gitignore and so on.
--index-filter
Modifies commit contents by altering the index. If you can effect the changes you want solely by manipulating the index, then this is much faster than --tree-filter since it does not have to check out the working tree. We give an example of this in Expunging Files.
--parent-filter
Modifies the commit’s parent list, transforming the list from stdin to stdout. The list is in the format specified by git-commit-tree(1).
--msg-filter
Modifies the commit message, transforming the message from stdin to stdout.
--commit-filter
Git runs this instead of the normal git commit-tree to actually perform the commit.
--tag-name-filter
Transforms the names of tags pointing to rewritten objects from stdin to stdout.

The value of this option is not a shell command, but rather a directory name:

--subdirectory-filter
Consider only history relevant to the given directory, and rewrite pathnames to make it the new project root. This creates a new history containing only files in that directory, with it as the new top of the repository.

As a hedge against mistakes, git filter-branch stores the original branch refs in the namespace refs/original (which you can change with --original). It will refuse to overwrite existing original refs without --force.

The arguments to git filter-branch are interpreted as by git rev-list, selecting the commits to be visited; to use arguments beginning with hyphens, separate them from the filter-branch options with -- as usual. For example, the default argument is HEAD, but you can rewrite all branches with one command: git filter-branch -- --all.

It does not make sense to specify commits to rewrite by commit ID:

$ git filter-branch 27e9535f
Which ref do you want to rewrite?

because when done, git filter-branch needs to update an existing ref to point to the rewritten branch. Ordinarily, you will give a branch name. If you limit the commit range using the negation of a ref, such as master..topic (equivalent to ^master topic), then only the refs mentioned in the positive sense will be updated; here, Git will visit the commits on topic that are not on master, but update only the topic branch when done.

Examples

Expunging Files

Suppose you discover that you have accidentally littered your history with some cruft, such as *.orig and *.rej files from patching, or *~ backup files from Emacs. You can expunge all such files from your entire project history with this command:

$ git filter-branch --index-filter 'git rm -q --cached --ignore-unmatch *.orig *.rej *~' -- --all

You might then add these patterns to your ignore rules, to prevent this from happening again.

Shifting to a Subdirectory

This recipe (using bash syntax) shifts the root of the current project into a subdirectory named sub:

$ git filter-branch --index-filter 'git ls-files -s | perl -pe "s-\\t-$&sub/-" | GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD

This is adapted from an example in git-filter-branch(1), but using Perl instead of sed for better portability (the original does not work with BSD-style sed commands, such as the one in OS X). It works by rewriting the output of git ls-files like so:

100644 6b1ad9fa764e36… 0       bar
100644 e69de29bb2d1d6… 0       foo/bar

which becomes:

100644 6b1ad9fa764e36… 0       sub/bar
100644 e69de29bb2d1d6… 0       sub/foo/bar

and updating the index accordingly for each commit. You can use this on a clone of the source repository in the git rebase recipe given earlier, to import one repository history into a subdirectory of another.

Updating Tags

In Commit Surgery: git replace, we said to use git filter-branch to apply object replacements made with git replace to the commit graph. There is a problem with this as given, though: it breaks any existing tags pointing to rewritten commits, since they remain untouched and continue to point to the old commits no longer on the rewritten branches. You can avoid this like so:

$ git filter-branch --tag-name-filter cat -- --all

Since --tag-name-filter rewrites tag names from stdin to stdout, cat acts as the identity filter and has Git rewrite all tags with their existing names unchanged as needed.

Warning

This will strip GnuPG signatures from the rewritten tags and commits.

Notes

Keep in mind when importing history in these ways, that while Git preserves the author timestamps in rewritten commits, git log orders its output by the commit timestamps, which will be new. The new history may thus show commits in an unexpected order. This is correct though: the commit’s content was created at one time, and that content was then committed to another repository at a later time. Unfortunately, git log does not have an option to order by author timestamp instead.

Get Git Pocket Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.