One of the hard parts of maintaining a large web site is ensuring that all the hypertext links, images, applets, and so forth remain valid as the site grows and changes. It’s easy to make a change somewhere that breaks a link somewhere else, exposing your users to those “Doh!"-producing 404 errors. What’s needed is a program to automate checking the links. This turns out to be surprisingly complex due to the variety of link types. But we can certainly make a start.
Since we already created a program that reads a web page and extracts
the URL-containing tags (Section 17.9), we can use
that here. The basic approach of our new
LinkChecker program is this: given a starting URL,
GetURLs object for it. If that succeeds,
read the list of URLs and go from there. This program has the
additional functionality of displaying the structure of the site
using simple indentation in a graphical window, as shown in Figure 17-3.
Figure 17-3. LinkChecker in action
So using the
GetURLS class from Section 17.9, the rest is largely a matter of elaboration.
A lot of this code has to do with the GUI (see Chapter 13). The code uses recursion: the routine
checkOut( ) calls itself each time a new page or
directory is started.
Example 17-8 shows the code for the
Example 17-8. LinkChecker.java
/** A simple HTML Link Checker. * Need a Properties file to set depth, ...