Storing the Data

Now that we’re successfully parsing out the individual elements from each line in the log file, what are we going to do with them? It’s time to think about what sorts of things we want to keep track of, and how to represent them in our data structure.

One good thing to keep track of is the time of the first and last access processed. When printed out in our report, this will let us see what range of time is covered by the analyzed log file lines.

Another obvious thing to keep track of is how many raw hits are in the log file. Similarly, we can track the total amount of data (in megabytes) sent out by the server, and the number of HTML page views.

We’ll begin implementing these features by adding the following to the top of the log_report.plx script, just before the start of the while loop that parses the log file lines:

my($begin_time, $end_time, $total_hits, $total_mb, $total_views);

This establishes a number of scalar variables that will be visible throughout the script, and will be used to store the various categories of information we’re interested in tracking.

Now, at the end of the while loop, we’ll comment out that debugging print statement and add the new lines shown here in order to store those various pieces of data:

# print join "\n", $host, $ident_user, $auth_user, $date, $time, # $time_zone, $method, $url, $protocol, $status, # $bytes, $referer, $agent, "\n"; unless ($begin_time) { $begin_time = "$date:$time"; } $end_time = "$date:$time"; ++$total_hits; ...

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.