A Single-Search Results Tool

To give you an idea of how the WWW::Search module works, we’ll start with a simple script that runs from the command line. This script will let us use arguments to specify which search engine to search, what query to submit to it, and so on. The script, called search_rank.plx, is in Example 18-1.

Example 18-1. Querying search engines from the command line with WWW::Search

#!/usr/bin/perl -w

# search_rank.plx

# using the WWW::Search module, compute the rank of the highest-ranked
# page for a particular site when searching a particular search
# engine for a particular query string.

use strict;

use WWW::Search;
use Getopt::Std;

my %opt;

getopts('s:u:q:m:', \%opt);

unless ($opt{s} and $opt{u} and $opt{q}) {

    die <<"EOF";
Usage: $0 [options]

Required options: -s search_engine
                  -u base_url
                  -q 'search query' 

Optional options: -m max_#_to_retrieve (defaults to 50)

EOF

}

my $max = $opt{m} || 50;

my $search = new WWW::Search($opt{s});
$search->maximum_to_retrieve($max);

my $base_url = quotemeta($opt{u});
my $rank     = 0;
my $count    = 1;

$search->native_query(WWW::Search::escape_query($opt{q}));
while (my $result = $search->next_result(  )) {
    if (not $rank and $result->url =~ /$base_url/o) {
        $rank = $count;
    }
    print "$count: ", $result->title || $result->url,
        ', ', $result->url, "\n";
    ++$count;
}

print "Rank: $rank\n";

Using the Getopt::Std Module

As you scan through this script, the first interesting thing you’ll notice is the use of the Getopt::Std module. This is a standard ...

Get Perl for Web Site Management now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.