Finding Lines in One File But Not in the Other
Problem
You have two data files and you need to compare them and find lines that exist in one file but not in the other.
Solution
Sort the files and isolate the data of interest using cut or awk if necessary, and then use comm, diff, grep, or uniq depending on your needs.
comm is designed for just this type of problem:
$ cat left record_01 record_02.left only record_03 record_05.differ record_06 record_07 record_08 record_09 record_10 $ cat right record_01 record_02 record_04 record_05 record_06.differ record_07 record_08 record_09.right only record_10 # Only show lines in the left file $ comm -23 left right record_02.left only record_03 record_05.differ record_06 record_09 # Only show lines in the right file $ comm -13 left right record_02 record_04 record_05 record_06.differ record_09.right only # Only show lines common to both files $ comm -12 left right record_01 record_07 record_08 record_10
diff will quickly show you all the differences from both files, but its
output is not terribly pretty and you may not need to know all the
differences. GNU grep’s -y
and -w
options can be handy for readability, but you can get used to the
regular output as well. Some systems (e.g., Solaris) may use sdiff instead of diff-y
or have a separate binary such
as bdiff to process very large
files.
$ diff -y -W 60 left right record_01 record_01 record_02.left only | record_02 record_03 | record_04 record_05.differ | record_05 record_06 | record_06.differ record_07 ...
Get bash Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.