Finding Lines in One File But Not in the Other

Problem

You have two data files and you need to compare them and find lines that exist in one file but not in the other.

Solution

Sort the files and isolate the data of interest using cut or awk if necessary, and then use comm, diff, grep, or uniq depending on your needs.

comm is designed for just this type of problem:

$ cat left
record_01
record_02.left only
record_03
record_05.differ
record_06
record_07
record_08
record_09
record_10

$ cat right
record_01
record_02
record_04
record_05
record_06.differ
record_07
record_08
record_09.right only
record_10
# Only show lines in the left file
$ comm -23 left right
record_02.left only
record_03
record_05.differ
record_06
record_09

# Only show lines in the right file
$ comm -13 left right
record_02
record_04
record_05
record_06.differ
record_09.right only

# Only show lines common to both files
$ comm -12 left right
record_01
record_07
record_08
record_10

diff will quickly show you all the differences from both files, but its output is not terribly pretty and you may not need to know all the differences. GNU grep’s -y and -w options can be handy for readability, but you can get used to the regular output as well. Some systems (e.g., Solaris) may use sdiff instead of diff-y or have a separate binary such as bdiff to process very large files.

$ diff -y -W 60 left right record_01 record_01 record_02.left only | record_02 record_03 | record_04 record_05.differ | record_05 record_06 | record_06.differ record_07 ...

Get bash Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.