To tie everything together, we'll be writing a simple link collector, which will visit a website and collect every link on every page it finds in that site. Before we start, though, we'll need some test data to work with. Simply write some HTML files to work with that contain links to each other and to other sites on the internet, something like this:
<html> <body> <a href="contact.html">Contact us</a> <a href="blog.html">Blog</a> <a href="http://esme.html">My Dog</a> <a href="http:///hobbies.html">Some hobbies</a> <a href="http:///contact.html">Contact AGAIN</a> <a href="http://www.archlinux.org/">Favorite OS</a> </body> </html>
Name one of the files
index.html so it shows up first when pages are served. Make sure the other files ...