Chapter 11. Five Quick Hacks: Downloading web Pages

Jon Orwant

Dan Gruhl

Sometimes it’s nice to visit web sites without being in front of your computer. Maybe you’d prefer to have the text of web pages mailed to you, or be notified when a web page changes. Or maybe you’d like to download a lot of information from a huge number of web pages (as in the article webpluck), and you don’t want to open them all one by one. Or maybe you’d like to write a robot that scours the web for information. Enter the LWP bundle (sometimes called libwww-perl), which contains two modules that can download web pages for you: LWP::Simple and LWP::UserAgent. LWP is available on CPAN and is introduced in Scripting the Web with LWP.

Dan Gruhl submitted five tiny but exquisite programs to TPJ, all using LWP to automatically download information from a web service. Instead of sprinkling these around various issues as one-liners, I’ve collected all five here with a bit of explanation for each.

The first thing to notice is that all five programs look alike. Each uses an LWP module (LWP::Simple in the first three, LWP::UserAgent in the last two) to store the HTML from a web page in Perl’s default scalar variable $_. Then they use a series of s/// substitutions to discard the extraneous HTML. The remaining text—the part we’re interested in—is displayed on the screen, although it could nearly as easily have been sent as email with the various Mail modules on CPAN.

Downloading Currency Exchange Rates

The currency.pl

Get Web, Graphics & Perl/Tk Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.