Summary

The grep program is the primary tool for extracting interesting lines of text from input datafiles. POSIX mandates a single version with different options to provide the behavior traditionally obtained from the three grep variants: grep, egrep, and fgrep.

Although you can search for plain string constants, regular expressions provide a more powerful way to describe text to be matched. Most characters match themselves, whereas certain others act as metacharacters, specifying actions such as "match zero or more of," "match exactly 10 of," and so on.

POSIX regular expressions come in two flavors: Basic Regular Expressions (BREs) and Extended Regular Expressions (EREs). Which programs use which regular expression flavor is based upon historical practice, with the POSIX specification reducing the number of regular expression flavors to just two. For the most part, EREs are a superset of BREs, but not completely.

Regular expressions are sensitive to the locale in which the program runs; in particular, ranges within a bracket expression should be avoided in favor of character classes such as [[:alnum:]]. Many GNU programs have additional metacharacters.

sed is the primary tool for making simple string substitutions. Since, in our experience, most shell scripts use sed only for substitutions, we have purposely not covered everything sed can do. The sed & awk book listed in the Chapter 16 provides more information.

The "longest leftmost" rule describes where text matches and for how long ...

Get Classic Shell Scripting now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.