The Log-Analysis Script

Now that the hostname lookups are taken care of, it’s time to write the log-analysis script. Example 8-2 shows the first version of that script.

Example 8-2. log_report.plx, a web log-analysis script (first version)

#!/usr/bin/perl -w

# log_report.plx

# report on web visitors

use strict;

while (<>) {
    my ($host, $ident_user, $auth_user, $date, $time,
            $time_zone, $method, $url, $protocol, $status, $bytes) = 
/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?)
 (\S+)" (\S+) (\S+)$/;

    print join "\n", $host, $ident_user, $auth_user, $date, $time,
        $time_zone, $method, $url, $protocol, $status,
        $bytes, "\n";
}

This first version of the script is simple. All it does is read in lines via the <> operator, parse those lines into their component pieces, and then print out the parsed elements for debugging purposes. The line that does the printing out is interesting, in that it uses Perl’s join function, which you haven’t seen before. The join function is the polar opposite, so to speak, of the split function: it lets you specify a string (in its first argument) that will be used to join the list comprising the rest of its arguments into a scalar. In other words, the Perl expression join '-', 'a', 'b', 'c' would return the string a-b-c. And in this case, using \n to join the various elements parsed by our script lets us print out a newline-separated list of those parsed items.

The Mammoth Regular Expression

The real juicy part of this script, though, ...

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.