The urllib Module
The urlib
module provides a unified client interface for HTTP, FTP, and
gopher. It automatically picks the right protocol handler based on
the uniform resource locator (URL) passed to the library.
Fetching data from a URL is extremely easy. Just call the
urlopen
method, and read from the returned stream
object, as shown in Example 7-14.
Example 7-14. Using the urllib Module to Fetch a Remote Resource
File: urllib-example-1.py import urllib fp = urllib.urlopen("http://www.python.org") op = open("out.html", "wb") n = 0 while 1: s = fp.read(8192) if not s: break op.write(s) n = n + len(s) fp.close() op.close() for k, v in fp.headers.items(): print k, "=", v print "copied", n, "bytes from", fp.urlserver = Apache/1.3.6 (Unix)
content-type = text/html
accept-ranges = bytes
date = Mon, 11 Oct 1999 20:11:40 GMT
connection = close
etag = "741e9-7870-37f356bf"
content-length = 30832
last-modified = Thu, 30 Sep 1999 12:25:35 GMT
copied 30832 bytes from http://www.python.org
Note that stream object provides some non-standard attributes.
headers
is a Message
object
(as defined by the mimetools
module), and
url
contains the actual URL. The latter is updated
if the server redirects the client to a new URL.
The urlopen
function is actually a helper
function, which creates an instance of the
FancyURLopener
class and calls its
open
method. To get special behavior, you can
subclass that class. For instance, the class in Example 7-15 automatically logs in to the server when ...
Get Python Standard Library now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.