Looking for Links
Now, let’s use
find_files.plx
as the starting point for a new script,
link_check.plx
, that will do some simplistic checking
for broken links in the HTML files it processes. The first step in
doing that is to modify the
&process
subroutine so that instead of
just printing out the names of the HTML files it processes, it opens
up each one and reads its contents. We can achieve that by modifying
the process
subroutine as follows:
sub process { # this is invoked by File::Find's find function for each # file it recursively finds. return unless /\.html$/;my $file = $File::Find::name;
unless (open IN, $file) {
warn "can't open $file for reading: $!, continuing...\n";
return;
}
my $data = join '', <IN>; # all the data at once
close IN;
return unless $data;
print "found $file, read the following data:\n\n$data\n";
}
Looking at the new parts line by line, we can see that the
package variable
$File::Find::name
is assigned to a
my
variable called $file
. This
is just for convenience. Since we’ll be using that variable
several times, typing $file
is going to be easier
than typing $File::Find::name
over and over again.
Next we open the file for reading, associating the
IN
filehandle with it in the
open
statement. Notice how we’re using
warn
rather than die
to check
for failed open
operations. The idea here is that
we probably want the script to continue its processing of files even
if something strange happens in the middle and one of them
can’t be opened for reading.
Next ...
Get Perl for Web Site Management now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.