Apache Logfiles

Like many people, I have a weblog and quite a considerable amount of other online writings sitting on my own web server. However once I’ve written something and it’s been indexed by the search engines, people tend to find it, and I feel a vague social responsibility to keep it up there. But things being as they are, and me tinkering as I do, documents go missing. Keeping track of the “404 Page Not Found” errors spit out by your server is, therefore, a good thing to do.

This script, therefore, goes through a standard Apache logfile and produces an RSS 2.0 feed of pages other people have found to be missing.

Walking Through the Code

Let’s start with the usual Perl good form of strict; and warnings; and then load up the marvellous Date::Manip module. This is perhaps a little overkill for its use here, but it does allow for some extremely simple and readable code. This is a CGI application, so we need that module, and we’re producing RSS, so XML::RSS is naturally required too.

use strict;
use warnings;
use Date::Manip;
use XML::RSS;
use CGI qw(:standard);

First off, let’s set up the feed. Because this is the simplest possible form of RSS— just a list, really—it is a perfect fit for RSS 2.0. Then we give it a nice title, link and description, as per the specification:

my $rss = new XML::RSS( version => '2.0' );
$rss->channel(
    title       => "Missing Files",
    link        => "http://www.example.org",
    description => "Files found to be missing from my server"
);

On my host at least, logfiles ...

Get Developing Feeds with RSS and Atom now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.