The urllib Module

The urlib module provides a unified client interface for HTTP, FTP, and gopher. It automatically picks the right protocol handler based on the uniform resource locator (URL) passed to the library.

Fetching data from a URL is extremely easy. Just call the urlopen method, and read from the returned stream object, as shown in Example 7-14.

Example 7-14. Using the urllib Module to Fetch a Remote Resource

File: urllib-example-1.py

import urllib

fp = urllib.urlopen("http://www.python.org")

op = open("out.html", "wb")

n = 0

while 1:
    s = fp.read(8192)
    if not s:
        break
    op.write(s)
    n = n + len(s)

fp.close()
op.close()

for k, v in fp.headers.items():
    print k, "=", v

print "copied", n, "bytes from", fp.url

server = Apache/1.3.6 (Unix)
content-type = text/html
accept-ranges = bytes
date = Mon, 11 Oct 1999 20:11:40 GMT
connection = close
etag = "741e9-7870-37f356bf"
content-length = 30832
last-modified = Thu, 30 Sep 1999 12:25:35 GMT
copied 30832 bytes from http://www.python.org

Note that stream object provides some non-standard attributes. headers is a Message object (as defined by the mimetools module), and url contains the actual URL. The latter is updated if the server redirects the client to a new URL.

The urlopen function is actually a helper function, which creates an instance of the FancyURLopener class and calls its open method. To get special behavior, you can subclass that class. For instance, the class in Example 7-15 automatically logs in to the server when ...

Get Python Standard Library now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.