Chapter 5. Web Clients

This chapter will talk about the HTTP client side of Twisted Web, starting with quick web resource retrieval for one-off applications and ending with the Agent API for developing flexible web clients.

Basic HTTP Resource Retrieval

Twisted has several high-level convenience classes for quick one-off resource retrieval.

Printing a Web Resource

twisted.web.client.getPage asynchronously retrieves a resource at a given URL. It returns a Deferred, which fires its callback with the resource as a string. Example 5-1 demonstrates the use of getPage; it retrieves and prints the resource at the user-supplied URL.

Example 5-1. print_resource.py
from twisted.internet import reactor
from twisted.web.client import getPage
import sys

def printPage(result):
    print result

def printError(failure):
    print >>sys.stderr, failure

def stop(result):
    reactor.stop()

if len(sys.argv) != 2:
    print >>sys.stderr, "Usage: python print_resource.py <URL>"
    exit(1)

d = getPage(sys.argv[1])
d.addCallbacks(printPage, printError)
d.addBoth(stop)

reactor.run()

We can test this script with:

python print_resource.py http://www.google.com

which will print the contents of Google’s home page to the screen.

An invalid URL will produce something like the following:

$ python print_resource.py http://notvalid.foo [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.DNSLookupError'>: DNS lookup failed: address 'notvalid.foo' not found: [Errno 8] nodename nor servname provided, or not known. ...

Get Twisted Network Programming Essentials, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.