The first three recipes in this chapter cover different ways of fetching web pages. The techniques they describe work well if you just need to fetch one specific web page, but in the interests of simplicity they omit some details you'll need to consider when writing a web spider, a web browser, or any other serious HTTP client. This recipe creates a library that deals with the details.
Any general client will have to be able to make both HTTP
and HTTPS requests. But the simple
Net:HTTP methods that work in Recipe 14.1 can't be used
to make HTTPS requests. Our library will use use
HTTPRequest objects for everything. If
the user requests a URL that uses the "https" scheme, we'll flip
the request object's
switch, as seen in Recipe
Lots of things can go wrong with an HTTP request: the page might have moved, it might require authentication, or it might simply be gone. Most HTTP errors call for higher-level handling or human intervention, but when a page has moved, a smart client can automatically follow it to its new location.
Our library will automatically follow redirects that provide "Location" fields in their responses. It'll prevent infinite redirect loops by refusing to visit a URL it's already visited. It'll prevent infinite redirect chains by limiting the number of redirects. After all the redirects are followed, it'll make the final URI available as a member of the response object.
Users use HTTP proxies ...