An Acronym Processor

Now let’s look at a program that scans a file for acronyms. Each acronym is replaced with a full text description, and the acronym in parentheses. If a line refers to “BASIC,” we’d like to replace it with the description “Beginner’s All-Purpose Symbolic Instruction Code” and put the acronym in parentheses afterwards. (This is probably not a useful program in and of itself, but the techniques used in the program are general and have many such uses.)

We can design this program for use as a filter that prints all lines, regardless of whether a change has been made. We’ll call it awkro.

awk '# awkro - expand acronyms 
# load acronyms file into array "acro"
FILENAME == "acronyms" {
	split($0, entry, "\t")
	acro[entry[1]] = entry[2]
	next
} 

# process any input line containing caps 
/[A-Z][A-Z]+/ {

	# see if any field is an acronym
	for (i = 1; i <= NF; i++)
		if ( $i in acro ) {
			# if it matches, add description 
			$i = acro[$i] " (" $i ")"
		}
}

{
	# print all lines
	print $0
}' acronyms  $*

Let’s first see it in action. Here’s a sample input file.

$ cat sample
The USGCRP is a comprehensive 
research effort that includes applied 
as well as basic research.
The NASA program Mission to Planet Earth 
represents the principal space-based component
of the USGCRP and includes new initiatives
such as EOS and Earthprobes.

And here is the file acronyms:

$ cat acronyms
USGCRP	U.S. Global Change Research Program
NASA	National Aeronautic and Space Administration
EOS	Earth Observing System

Now we run ...

Get sed & awk, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.