Counting Differences in Files

Problem

You have two files and need to know about how many differences exist between them.

Solution

Count the hunks (i.e., sections of changed data) in diff’s output:

$ diff -C0 original_file modified_file | grep -c "^\*\*\*\*\*"
2

$ diff -C0 original_file modified_file
*** original_file       Fri Nov 24 12:48:35 2006
--- modified_file       Fri Nov 24 12:48:43 2006
***************
*** 1 ****
! This is original_file, and this line is different.
--- 1 ---
! This is modified_file, and this line is different.
***************
*** 6 ****
! But this one is different.
--- 6 ---
! But this 1 is different.

If you only need to know whether the files are different and not how many differences there are, use cmp. It will exit at the first difference, which can save time on large files. Like diff it is silent when the files are identical, but it reports the location of the first difference if not:

$ cmp original_file modified_file
original_file modified_file differ: char 9, line 1

Discussion

Hunk is actually the technical term, though we’ve also seen hunks referred to as chunks in some places. Note that it is possible, in theory, to get slightly different results for the same files across different machines or versions of diff, since the number of hunks is a result of the algorithm diff uses. You will certainly get different answers when using different diff output formats, as demonstrated below.

We find a zero-context contextual diff to be the easiest to use for this purpose, and using ...

Get bash Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.