Using CPAN

As I previously pointed out, this first link-checking script is fairly limited. It only checks links that point to the local filesystem, and it will be confused by HTML pages containing things like <BASE HREF="..."> tags, which modify how the relative links on a page are resolved by a browser. Still, it runs quickly, and on a big site that doesn’t violate its assumptions it makes short work of checking for at least the more obvious broken links.

A nice enhancement would be to make it check offsite links as well, using HTTP to request pages just like a web browser. We could write our own web browsing code to do this using Perl, but fortunately that work has already been done, and done better than you or I are likely to be able to do it. The person responsible for that is a very helpful member of the extended Perl community named Gisle Aas, author of the LWP module (short for libwww-perl ).

Using LWP will save us vast amounts of time and headache. Since it is not currently included in the standard Perl distribution, though, we will need to download it from CPAN (the Comprehensive Perl Archive Network, at http://www.cpan.org/), and install it (assuming it isn’t already installed as part of the copy of Perl we are using). Learning to do that will take some initial effort, but believe me, we’ll be better off in the long run for having invested that time up front.

Checking for LWP

Before we jump in and start the download-and-install process, make the following quick check to ...

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.