O'Reilly logo

Linux Server Hacks by Rob Flickenger

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hack #6. Building Complex Command Lines

Build simple commands into full-fledged paragraphs for complex (but meaningful) reports

Studying Linux (or indeed any Unix) is much like studying a foreign language. At some magical point in the course of one's studies, halting monosyllabic mutterings begin to meld together into coherent, often used phrases. Eventually, one finds himself pouring out entire sentences and paragraphs of the Unix Mother Tongue, with one's mind entirely on the problem at hand (and not on the syntax of any particular command). But just as high school foreign language students spend much of their time asking for directions to the toilet and figuring out just what the dative case really is, the path to Linux command-line fluency must begin with the first timidly spoken magic words.

Your shell is very forgiving, and will patiently (and repeatedly) listen to your every utterance, until you get it just right. Any command can serve as the input for any other, making for some very interesting Unix "sentences." When armed with the handy (and probably over-used) up arrow, it is possible to chain together commands with slight tweaks over many tries to achieve some very complex behavior.

For example, suppose that you're given the task of finding out why a web server is throwing a bunch of errors over time. If you type less error_log, you see that there are many "soft errors" relating to missing (or badly linked) graphics:

[Tue Aug 27 00:22:38 2002] [error] [client 17.136.12.171] File does not 
exist: /htdocs/images/spacer.gif
[Tue Aug 27 00:31:14 2002] [error] [client 95.168.19.34] File does not 
exist: /htdocs/image/trans.gif
[Tue Aug 27 00:36:57 2002] [error] [client 2.188.2.75] File does not 
exist: /htdocs/images/linux/arrows-linux-back.gif
[Tue Aug 27 00:40:37 2002] [error] [client 2.188.2.75] File does not 
exist: /htdocs/images/linux/arrows-linux-back.gif
[Tue Aug 27 00:41:43 2002] [error] [client 6.93.4.85] File does not 
exist: /htdocs/images/linux/hub-linux.jpg
[Tue Aug 27 00:41:44 2002] [error] [client 6.93.4.85] File does not 
exist: /htdocs/images/xml/hub-xml.jpg
[Tue Aug 27 00:42:13 2002] [error] [client 6.93.4.85] File does not 
exist: /htdocs/images/linux/hub-linux.jpg
[Tue Aug 27 00:42:13 2002] [error] [client 6.93.4.85] File does not 
exist: /htdocs/images/xml/hub-xml.jpg

and so on. Running a logging package (like analog) reports exactly how many errors you have seen in a day but few other details (which is how you were probably alerted to the problem in the first place). Looking at the logfile directly gives you every excruciating detail but is entirely too much information to process effectively.

Let's start simple. Are there any errors other than missing files? First we'll need to know how many errors we've had today:

$ wc -l error_log
1265 error_log

And how many were due to File does not exist errors?

$ grep "File does not exist:" error_log | wc -l
1265 error_log

That's a good start. At least we know that we're not seeing permission problems or errors in anything that generates dynamic content (like cgi scripts.) If every error is due to missing files (or typos in our html that point to the wrong file) then it's probably not a big problem. Let's generate a list of the filenames of all bad requests. Hit the up arrow and delete that wc -l:

$ grep "File does not exist:" error_log | awk '{print $13}' | less

That's the sort of thing that we want (the 13th field, just the filename), but hang on a second. The same couple of files are repeated many, many times. Sure, we could email this to the web team (all whopping 1265 lines of it), but I'm sure they wouldn't appreciate the extraneous spam. Printing each file exactly once is easy:

$ grep "File does not exist:" error_log | awk '{print $13}' | sort | uniq |
less

This is much more reasonable (substitute a wc -l for that less to see just how many unique files have been listed as missing). But that still doesn't really solve the problem. Maybe one of those files was requested once, but another was requested several hundred times. Naturally, if there is a link somewhere with a typo in it, we would see many requests for the same "missing" file. But the previous line doesn't give any indication of which files are requested most. This isn't a problem for bash; let's try out a command line for loop.

$ for x in `grep "File does not exist" error_log | awk '{print $13}' | sort 
| uniq`; do \
echo -n "$x : "; grep $x error_log | wc -l; done

We need those backticks ( `) to actually execute our entire command from the previous example and feed the output of it to a for loop. On each iteration through the loop, the $x variable is set to the next line of output of our original command (that is, the next unique filename reported as missing). We then grep for that filename in the error_log, and count how many times we see it. The echo at the end just prints it in a somewhat nice report format.

I call it a somewhat nice report because not only is it full of single hit errors (which we probably don't care about), the output is very jagged, and it isn't even sorted! Let's sort it numerically, with the biggest hits at the top, numbers on the left, and only show the top 20 most requested "missing" files:

$ for x in `grep "File does not exist" error_log | awk '{print $13}' | sort 
| uniq`; do \
grep $x error_log | wc -l | tr -d '\n'; echo " : $x"; done | sort +2 -rn | 
head -20

That's much better and not even much more typing than the last try. We need the tr to eliminate the trailing newline at the end of wc's output (why it doesn't have a switch to do this, I'll never know). Your output should look something like this:

595 : /htdocs/images/pixel-onlamp.gif.gif
156 : /htdocs/image/trans.gif
139 : /htdocs/images/linux/arrows-linux-back.gif
68 : /htdocs/pub/a/onjava/javacook/images/spacer.gif
50 : /htdocs/javascript/2001/03/23/examples/target.gif

From this report, it's very simple to see that almost half of our errors are due to a typo on a popular web page somewhere (note the repeated .gif.gif in the first line). The second is probably also a typo (should be images/, not image/). The rest are for the web team to figure out:

$ ( echo "Here's a report of the top 20 'missing' files in the error_log."; 
echo; \
for x in `grep "File does not exist" error_log | awk '{print $13}' | sort | 
uniq`; do \
grep $x error_log | wc -l | tr -d '\n'; echo " : $x"; done | sort +2 -rn | 
head -20 )\
| mail -s "Missing file report" webmaster@oreillynet.com

and maybe one hardcopy for the weekly development meeting:

$ for x in `grep "File does not exist" error_log | awk '{print $13}' | sort 
| uniq`; do \
grep $x error_log | wc -l | tr -d '\n'; echo " : $x"; done | sort +2 -rn | 
head -20 \
| enscript

Hacking the Hack

Once you get used to chunking groups of commands together, you can chain their outputs together indefinitely, creating any sort of report you like out of a live data stream. Naturally, if you find yourself doing a particular task regularly, you might want to consider turning it into a shell script of its own (or even reimplementing it in Perl or Python for efficiency's sake, as every | in a command means that you've spawned yet another program). On a modern (and unloaded) machine, you'll hardly notice the difference, but it's considered good form to clean up the solution once you've hacked it out. And on the command line, there's plenty of room to hack.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required