Searching for an SSN

Problem

You need a regular expression to match a Social Security number. These numbers are nine digits long, typically grouped as three digits, then two digits, then a final four digits (e.g., 123-45-6789). Sometimes they are written without hyphens, so you need to make hyphens optional in the regular expression.

Solution

$ grep '[0-9]\{3\}-\{0,1\}[0-9]\{2\}-\{0,1\}[0-9]\{4\}' datafile

Discussion

These kinds of regular expressions are often jokingly referred to as write only expressions, meaning that they can be difficult or impossible to read. We’ll take this one apart to help you understand it. In general, though, in any bash script that you write using regular expressions, be sure to put comments nearby explaining what you intended the regular expression to match.

If we added some spaces to the regular expression we would improve its readability, making visual comprehension easier, but it would change the meaning—it would say that we’d need to match space characters at those points in the expression. Ignoring that for the moment, let’s insert some spaces into the previous regular expression so that we can read it more easily:

[0-9]\{3\} -\{0,1\} [0-9]\{2\} -\{0,1\} [0-9]\{4\}

The first grouping says “any digit” then “exactly 3 times.” The next grouping says “a dash” then “0 or 1 time.” The third grouping says “any digit” then “exactly 2 times.” The next grouping says “a dash” then “0 or 1 time.” The last grouping says “any digit” then “exactly 4 times.”

See Also

Get bash Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.