Chapter 11. Understanding Patches

A “patch” is a compact representation of the differences between two files, intended for use with line-oriented text files. It describes how to turn one file into another, and is asymmetric: the patch from file1 to file2 is not the same as the patch for the other direction (it would say to delete and add opposite lines, as we will see). The patch format uses context as well as line numbers to locate differing file regions, so that a patch can often be applied to a somewhat earlier or later version of the first file than the one from which it was derived, as long as the applying program can still locate the context of the change.

The terms “patch” and “diff” are often used interchangeably, although there is a distinction, at least historically. A diff only need show the differences between two files, and can be quite minimal in doing so. A patch is an extension of a diff, augmented with further information such as context lines and filenames, which allow it to be applied more widely. These days, the Unix diff program can produce patches of various kinds.

Here’s a simple patch, generated by git diff:

diff --git a/foo.c b/foo.c
index 30cfd169..8de130c2 100644
--- a/foo.c
+++ b/foo.c
@@ -1,5 +1,5 @@
 #include <string.h>

 int check (char *string) {
-    return !strcmp(string, "ok");
+    return (string != NULL) && !strcmp(string, "ok");
 }

Breaking this into sections:

diff --git a/foo.c b/foo.c

This is the Git diff header; diff --git isn’t a literal command, but rather just suggests the notion of a Git-specific diff in Unix command style. a/foo.c and b/foo.c are the files being compared, with added leading directory names a and b to distinguish them in case they are the same (as they are here; this patch shows the changes from one version to another of the same file). To generate this patch, I changed the file foo.c and ran git diff, which shows the unstaged changes between the working tree and the index. There are in fact no directories named a and b in the repository; they are just convention:

index 30cfd169..8de130c2 100644

This is an extended header line, one of several possible forms, though there is only one in this patch. This line gives information from the Git index regarding this file: 30cfd169 and 8de130c2 are the blob IDs of the A and B versions of the file contents being compared, and 100644 are the “mode bits,” indicating that this is a regular file: not executable and not a symbolic link (the use of .. here between the blob IDs is just as a separator and has nothing to do with its use in naming either sets of revs or for git diff). Other header lines might indicate the old and new modes if that had changed, old and new filenames if the file were being renamed, etc.

The blob IDs are helpful if this patch is later applied by Git to the same project and there are conflicts while applying it. If those blobs are in the object database, then Git can use them to perform a three-way merge with those two versions and the working copy, to help you resolve the conflicts. The patch still makes sense to other tools besides Git; they will just ignore this line and not be able to use the extra information:

--- a/foo.c
+++ b/foo.c

This is the traditional “unified diff” header, again showing the files being compared and the direction of the changes, which will be shown later: minus signs will show lines in the A version but missing from the B version; and plus signs, lines missing in A but present in B. If the patch were of this file being added or deleted in its entirety, one of these would be /dev/null to signal that:

@@ -1,5 +1,5 @@
 #include <string.h>

 int check (char *string) {
-    return !strcmp(string, "ok");
+    return (string != NULL) && !strcmp(string, "ok");
 }

This is a difference section, or “hunk,” of which there is just one in this diff. The line beginning with @@ indicates by line number and length the positions of this hunk in the A and B versions; here, the hunk starts at line 1 and extends for 5 lines in both versions. The subsequent lines beginning with a space are context: they appear as shown in both versions of the file. The lines beginning with minus and plus signs have the meanings just mentioned: this patch replaces a single line, fixing a common C bug whereby the program would crash if this function were passed a null pointer as its string argument.

A single patch file can contain the differences for any number of files, and git diff produces diffs for all altered files in the repository in a single patch (unlike the usual Unix diff command, which requires extra options to recursively process whole directory trees).

Applying Plain Diffs

If you save the output of git diff to a file (e.g., with git diff > foo.patch), you can apply it to the same or a similar version of the file elsewhere with git apply, or with other common tools that handle diff format, such as patch (although they won’t be able to use any extra Git-specific information in the diff). This is useful for saving a set of uncommitted changes to apply to a different set of files, or for transmitting any set of changes to someone else who is not using Git.

You can use the output of git show commit as a patch representing the changes for a given nonmerge commit, as a shortcut for git diff commit~ commit (explicitly comparing a commit and its parent).

Patches with Commit Information

There is another patch format, specific to Git, that contains not only patches for some number of files, but also commit metadata: the author, timestamp, and message. This carries all the information needed to reapply the changes from one commit as a new commit elsewhere, and is useful for transmitting a commit when it is not possible or convenient to do so with the usual Git push/pull mechanism.

You produce this patch format with git format-patch, and apply it with git am. The patch itself is actually in the venerable Unix mailbox format, using the email “from,” “date,” and “subject” headers as the author, timestamp, and commit message subject, and the email body as the rest of the message. A commit patch for the previous example might look like this:

From ccadc07f2e22ed56c546951… Mon Sep 17 00:00:00 2001
From: "Richard E. Silverman" <res@oreilly.com>
Date: Mon, 11 Feb 2013 00:42:41 -0500
Subject: [PATCH] fix null-pointer bug in check()

It is truly a wonder that we continue to write
high-level application software in what is essentially
assembly language. We deserve all the segfaults we
get.

---
foo.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/foo.c b/foo.c
...

The initial line contains the original commit ID, and a fixed timestamp meant to signal that this “email” was produced by git format-patch. The [PATCH] prefix in the subject is not part of the commit message, but rather intended to distinguish patches among other email messages. A diffstat summary of the patch comes next (which you can suppress with --no-stat (-p)), followed by the patch itself in the format shown earlier.

You run git format-patch like so:

$ git format-patch [options] [revisions]

The revisions argument can be any expression specifying a set of commits to format, as described in Chapter 8. As an exception, however, a single commit C means C..HEAD, i.e., the commits on the current branch not contained in C (if C is on this branch, these are the commits made since C). To get the other meaning instead—that is, all commits reachable from C—use the --root option.

By default, Git writes patches for the selected commits into sequentially numbered files in the current directory, with names reflecting the commit message subject lines like so:

0001-work-around-bug-with-DNS-KDC-location.patch
0002-use-DNS-realm-mapping-even-for-local-host.patch
0003-fix-AP_REQ-authenticator-bug.patch
0004-add-key-extraction-to-kadmin.patch

The leading numbers in the filenames makes it easy to apply these patches in order with git am *.patch, since the shell will sort them lexicographically when expanding the wildcard. You can give a different output directory with --output-directory (-o), or write the patches all to standard output with --stdout; git am reads from standard input if given no file arguments (or a single hyphen as the only argument, git am -).

git format-patch takes a number of other options for controlling the resulting email format, such as adding other mail headers, as well as many options taken by git diff to affect the diff itself; see git-format-patch(1) and git-diff(1) for more detail. Also see Importing Linear History for some examples.

Get Git Pocket Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.