O'Reilly logo

Building Tag Clouds in Perl and PHP by Jim Bumgardner

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Collecting Genesis Words in Perl

Our first script, makeGenesisTags.pl, produces a list of the words that appear in the book of Genesis in the Bible. The data is retrieved from the copy of the book of Genesis at the Project Gutenberg web site. To run the script, enter this command:

makeGenesisTags.pl

It will produce a file called genesis.pl. This script uses LWP::Simple to screen-scrape the Project Gutenberg web site. Let's see how it works by examining the script:

#!/usr/bin/perl

use HTTP::Cache::Transparent;
use LWP::Simple;
use Data::Dumper;

use strict;
use warnings;

These lines insure that the HTTP::Cache::Transparent, LWP::Simple, and Data::Dumper modules are available. If they aren't, you'll see an error message when you run the script that says something like "Can't locate Data/Dumper.pm in @INC."

use strict;
use warnings;

The above lines turn on strict warnings that help you avoid misspelled variable names and other common problems in your script.

$Data::Dumper::Terse= 1;  # avoids $VAR1 = * ; in dumper output

This line prevents Data::Dumper from prefixing its output with the boilerplate text $VAR1 = . This allows us to save the data to different variable names.

HTTP::Cache::Transparent::init( {
  BasePath => './cache',
  NoUpdate => 30*60
} );

The HTTP::Cache::Transparent module provides a simple way to make screen-scraping scripts more efficient. When you read data from a web site, a copy of the data is kept in a cached file. Subsequent reads will use the cached data rather than pulling ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required