Chapter 3. Understanding Regular Expression Syntax

When a young child is struggling to understand the meaning of an idiomatic expression, such as “Someone let the cat out of the bag,” you might help by explaining that it’s an expression, and doesn’t literally mean what it says.

An expression, even in computer terminology, is not something to be interpreted literally. It is something that needs to be evaluated. An expression describes a result.

In this chapter, we are going to look at regular expression syntax. A regular expression describes a pattern or a particular sequence of characters, although it does not necessarily specify a single exact sequence.

While regular expressions are a basic part of UNIX, not everyone has a complete understanding of the syntax. In fact, it can be quite confusing to look at an expression such as:

^□□*.*

which uses metacharacters or special symbols to match a line with one or more leading spaces. (A square box, □, is used to make spaces visible in our examples.)

If you use any UNIX text editor on a routine basis, you are probably somewhat familiar with regular expression syntax. grep, sed, and awk all use regular expressions. However, not all of the metacharacters used in regular expression syntax are available for all three programs. The basic set of metacharacters was introduced with the ed line editor, and made available in grep. Sed uses the same set of metacharacters. Later a program named egrep was introduced that offered an extended set of metacharacters. ...

Get sed & awk, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.