Different Log File Formats

It’s fairly easy to modify this script to accept either the common or the extended log format. We do that by adding a configuration variable near the top of the script that looks like this:

my $log_format = 'common'; # 'common' or 'extended'

Then we modify the part of the script where the regular expression parsing occurs to include some logic to check that $log_format variable, along with a second version of the regular expression to be used on logs that are in the extended format:

            
    if ($log_format eq 'common') {

        ($host, $ident_user, $auth_user, $date, $time,
            $time_zone, $method, $url, $protocol, $status, $bytes) = 
/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?)
 (\S+)" (\S+) (\S+)$/
            or next;

    } elsif ($log_format eq 'extended') {

        ($host, $ident_user, $auth_user, $date, $time,

            $time_zone, $method, $url, $protocol, $status, $bytes,
            $referer, $agent) = 
/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?)
 (\S+)" (\S+) (\S+) "([^"]+)" "([^"]+)"$/
            or next;
    } else {
        die "unrecognized log format '$log_format'";
    }

I think this probably qualifies as the ugliest block of code in this entire book. This is not the sort of code that anybody wants to have to make sense of more than once, but fortunately, once we get it right, we aren’t likely to need to modify it.

Anyway, you’ll notice that the new regular expression for extended-format logs has a couple of new chunks at the end, both of which look like "([^"]+)". By now that should ...

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.