11.13. Parsing a Web Server Log File

Problem

You want to do calculations based on the information in your web server’s access log file.

Solution

Open the file and parse each line with a regular expression that matches the log file format. This regular expression matches the NCSA Combined Log Format:

$pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) (\[[^\]]+\]) "(.*) (.*) (.*)" ([0-9\-]+)
    ([0-9\-]+) "(.*)" "(.*)"$/';

Discussion

This program parses the NCSA Combined Log Format lines and displays a list of pages sorted by the number of requests for each page:

$log_file = '/usr/local/apache/logs/access.log'; $pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) (\[[^\]]+\]) "(.*) (.*) (.*)" ([0-9\-]+) ([0-9\-]+) "(.*)" "(.*)"$/'; $fh = fopen($log_file,'r') or die($php_errormsg); $i = 1; $requests = array(); while (! feof($fh)) { // read each line and trim off leading/trailing whitespace if ($s = trim(fgets($fh,16384))) { // match the line to the pattern if (preg_match($pattern,$s,$matches)) { /* put each part of the match in an appropriately-named * variable */ list($whole_match,$remote_host,$logname,$user,$time, $method,$request,$protocol,$status,$bytes,$referer, $user_agent) = $matches; // keep track of the count of each request $requests[$request]++; } else { // complain if the line didn't match the pattern error_log("Can't parse line $i: $s"); } } $i++; } fclose($fh) or die($php_errormsg); // sort the array (in reverse) by number of requests arsort($requests); // print formatted results foreach ($requests ...

Get PHP Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.