Records and Fields
Each iteration of the implicit loop over the input files in awk's programming model processes a single record, typically a line of text. Records are further divided into smaller strings, called fields.
Record Separators
Although records are normally text lines separated by
newline characters, awk allows more
generality through the record-separator built-in variable, RS
.
In traditional and POSIX awk,
RS
must be either a single literal
character, such as newline (its default value), or an empty string.
The latter is treated specially: records are then paragraphs separated
by one or more blank lines, and empty lines at the start or end of a
file are ignored. Fields are then separated by newlines or whatever
FS
is set to.
gawk and mawk provide an important extension:
RS
may be a regular expression,
provided that it is longer than a single character. Thus, RS = "+
" matches a literal plus, whereas
RS = ":+
" matches one or more
colons. This provides much more powerful record specification, which
we exploit in some of the examples in Section 9.6.
With a regular expression record separator, the text that
matches the separator can no longer be determined from the value of
RS
. gawk provides it as a language extension in
the built-in variable RT
, but mawk does not.
Without the extension of RS
to regular expressions, it can be hard to simulate regular expressions as record separators, if they can match across line boundaries, because most Unix text processing tools deal with a ...
Get Classic Shell Scripting now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.