Reporting the Most Popular Pages

Our log_report.plx script will produce interesting output in its current form, but it could do more. One obvious thing to report on is which pages on the site are the most popular. To accomplish this we need to augment our data structure. First, we add a %views_by_page hash to the script-wide my variable declaration up at the top:

my ($begin_time, $end_time, $total_hits, $total_mb, $total_views,
    $total_visits, %visit_num, %host, %first_time, %last_time,
    %last_seconds, %page_sequence, %referer, %agent,
    %views_by_page);

Using the requested URL as the key, we can increment that hash value at the end of the while loop that processes log file lines, just before the invocation of the &store_line subroutine. Thus, the last part of the while loop will look like this:

            ++$views_by_page{$url};
    &store_line($host, $date, $time, $url, $referer, $agent);
}

Now that we’ve stored the data in that hash, it’s time to modify the script to give us a report on the most popular pages. One question we need to think about first, though, is how far down the list of most-popular pages we wish to go in that reporting. We should probably make that a configuration variable up at the top of the script, which we can do with something like this:

my $depth            = 20;  # how deep to go in reporting top N pages

Now we can put the following immediately after the close SUMMARY line, toward the end of the script proper:

close SUMMARY or die "couldn't close $summary_file after writing: $!\n"; ...

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.