Counting String Values

Problem

You need to count all the occurrences of several different strings, including some strings whose values you don’t know beforehand. That is, you’re not trying to count the occurrences of a pre-determined set of strings. Rather, you are going to encounter some strings in your data and you want to count these as-yet-unknown strings.

Solution

Use awk’s associative arrays (also known as hashes) for your counting.

For our example, we’ll count how many files are owned by various users on our system. The username shows up as the third field in an ls-l output. So we’ll use that field ($3) as the index of the array, and increment that member of the array:

#
# cookbook filename: asar.awk
#
NF > 7 {
    user[$3]++
}
END {
    for (i in user) {
        printf "%s owns %d files\n", i, user[i]
    }
}

We invoke awk a bit differently here. Because this awk script is a bit more complex, we’ve put it in a separate file. We use the -f option to tell awk where to get the script file:

$ ls -lR /usr/local | awk -f asar.awk
bin owns 68 files
albing owns 1801 files
root owns 13755 files
man owns 11491 files
$

Discussion

We use the condition NF > 7 as a qualifier to part of the awk script to weed out the lines that do not contain filenames, which appear in the ls -lR output and are useful for readability because they include blank lines to separate different directories as well as total counts for each subdirectory. Such lines don’t have as many fields (or words). The expression NF>7 that precedes the ...

Get bash Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.