We have used the word-count utility, wc, a few times before. It is probably one of the oldest, and simplest, tools in the Unix toolbox, and POSIX standardizes it. By default, wc outputs a one-line report of the number of lines, words, and bytes:
$ echo This is a test of the emergency broadcast system | wc
Report counts
1 9 49
Request a subset of those results with the -c
(bytes), -l
(lines), and -w
(words)
options:
$echo Testing one two three | wc -c
Count bytes 22 $echo Testing one two three | wc -l
Count lines 1 $echo Testing one two three | wc -w
Count words 4
The -c
option originally stood for
character count, but with multibyte character-set
encodings, such as UTF-8, in modern systems, bytes are no longer synonymous
with characters, so POSIX introduced the -m
option to
count multibyte characters. For 8-bit character data, it is the same as
-c
.
Although wc is most commonly used with input from a pipeline, it also accepts command-line file arguments, producing a one-line report for each, followed by a summary report:
$ wc /etc/passwd /etc/group
Count data in two files
26 68 1631 /etc/passwd
10376 10376 160082 /etc/group
10402 10444 161713 total
Modern versions of wc are
locale-aware: set the environment variable LC_CTYPE
to the desired locale to influence
wc's interpretation of byte sequences
as characters and word separators.
In Chapter 5, we will develop a related tool, wf, to report the frequency of occurrence of each word.
Get Classic Shell Scripting now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.