Chapter 20. Web Automation

The web, then, or the pattern, a web at once sensuous and logical, an elegant and pregnant texture: that is style, that is the foundation of the art of literature.

Robert Louis Stevenson, On some Technical Elements of Style in Literature (1885)

Introduction

Chapter 19 concentrated on responding to browser requests and producing documents using CGI. This chapter approaches the Web from the other side: instead of responding to a browser, you pretend to be one, generating requests and processing returned documents. We make extensive use of modules to simplify this process because the intricate network protocols and document formats are tricky to get right. By letting existing modules handle the hard parts, you can concentrate on the interesting part—your own program.

The relevant modules can all be found under the following URL:

http://search.cpan.org/modlist/World_Wide_Web

There you’ll find modules for computing credit card checksums, interacting with Netscape or Apache server APIs, processing image maps, validating HTML, and manipulating MIME. The largest and most important modules for this chapter, though, are found in the libwww-perl suite of modules, referred to collectively as LWP. Table 20-1 lists just a few modules included in LWP.

Table 20-1. LWP modules (continued)

Module name

Purpose

LWP::UserAgent

WWW user agent class

LWP::RobotUA

Develop robot applications

LWP::Protocol

Interface to various protocol schemes

LWP::Authen::Basic

Handle 401 and 407 responses

Get Perl Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.