Web Site Checker

I run a small web site (http://www.oualline.com), and I want to make sure that it is consistent. In other words, I want to check for

  • Broken links

  • Files in the HTML tree that are never referenced (orphans)

This section shows you how to write a program to do this. First, the program needs to go through the web pages and extract the links. A Perl module does most of that: HTML::SimpleLinkExtor.

You also need to go through the file tree and get a list of all the files. A Perl module also does that: File::Find.

You need to check to make sure that all the referenced files (internal links) exist. A Perl operator does that: f.

Finally, you need to check to see whether the external links are valid. A Perl module does that LWP::Simple ...

Get Perl for C Programmers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.