Looking for Links

Now, let’s use find_files.plx as the starting point for a new script, link_check.plx , that will do some simplistic checking for broken links in the HTML files it processes. The first step in doing that is to modify the &process subroutine so that instead of just printing out the names of the HTML files it processes, it opens up each one and reads its contents. We can achieve that by modifying the process subroutine as follows:

sub process {

    # this is invoked by File::Find's find function for each
    # file it recursively finds.

    return unless /\.html$/;
    my $file = $File::Find::name;
    unless (open IN, $file) { 
        warn "can't open $file for reading: $!, continuing...\n";
        return;
    }
    my $data = join '', <IN>; # all the data at once
    close IN;
    return unless $data;
    print "found $file, read the following data:\n\n$data\n";
}

Looking at the new parts line by line, we can see that the package variable $File::Find::name is assigned to a my variable called $file. This is just for convenience. Since we’ll be using that variable several times, typing $file is going to be easier than typing $File::Find::name over and over again.

Next we open the file for reading, associating the IN filehandle with it in the open statement. Notice how we’re using warn rather than die to check for failed open operations. The idea here is that we probably want the script to continue its processing of files even if something strange happens in the middle and one of them can’t be opened for reading.

Next ...

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.