Counting String Values
Problem
You need to count all the occurrences of several different strings, including some strings whose values you don’t know beforehand. That is, you’re not trying to count the occurrences of a pre-determined set of strings. Rather, you are going to encounter some strings in your data and you want to count these as-yet-unknown strings.
Solution
Use awk’s associative arrays (also known as hashes) for your counting.
For our example, we’ll count how many files are owned by various
users on our system. The username shows up as the third field in an
ls-l
output. So we’ll use that field
($3
) as the index of the array, and
increment that member of the array:
# # cookbook filename: asar.awk # NF > 7 { user[$3]++ } END { for (i in user) { printf "%s owns %d files\n", i, user[i] } }
We invoke awk a bit differently here. Because
this awk script is a bit more complex, we’ve put it
in a separate file. We use the -f
option to tell
awk where to get the script file:
$ ls -lR /usr/local | awk -f asar.awk bin owns 68 files albing owns 1801 files root owns 13755 files man owns 11491 files $
Discussion
We use the condition NF > 7
as a qualifier to part of the awk script to weed
out the lines that do not contain filenames, which appear in the
ls -lR
output and are useful for
readability because they include blank lines to separate different
directories as well as total counts for each subdirectory. Such lines
don’t have as many fields (or words). The expression NF>7
that precedes the ...
Get bash Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.