Records and Fields

Each iteration of the implicit loop over the input files in awk's programming model processes a single record, typically a line of text. Records are further divided into smaller strings, called fields.

Record Separators

Although records are normally text lines separated by newline characters, awk allows more generality through the record-separator built-in variable, RS.

In traditional and POSIX awk, RS must be either a single literal character, such as newline (its default value), or an empty string. The latter is treated specially: records are then paragraphs separated by one or more blank lines, and empty lines at the start or end of a file are ignored. Fields are then separated by newlines or whatever FS is set to.

gawk and mawk provide an important extension: RS may be a regular expression, provided that it is longer than a single character. Thus, RS = "+" matches a literal plus, whereas RS = ":+" matches one or more colons. This provides much more powerful record specification, which we exploit in some of the examples in Section 9.6.

With a regular expression record separator, the text that matches the separator can no longer be determined from the value of RS. gawk provides it as a language extension in the built-in variable RT, but mawk does not.

Without the extension of RS to regular expressions, it can be hard to simulate regular expressions as record separators, if they can match across line boundaries, because most Unix text processing tools deal with a ...

Get Classic Shell Scripting now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.