Hack #39. Search CPAN Modules Locally

Search the CPAN without leaving the command line.

Websites such as http://search.cpan.org/ are fantastic for finding the Perl module you need from the CPAN, but firing up a web browser, navigating to the page, and waiting for the results can be slow.

Similarly, running the CPAN or CPANPLUS shell and doing i search term is also slow. Besides that, you might not even have a network connection.

The Hack

The last time the CPAN or CPANPLUS shell connected to a CPAN mirror it downloaded a file listing every single module—03modlist.data.gz. You can see the file at ftp://cpan.org/modules/03modlist.data.gz. Because you have that local copy, you can parse it, check the modules that match your search terms, and print the results.

Additionally you can check to see if any of them are installed already and highlight them.

#!perl -w

# import merrily
use strict;
use IO::Zlib;
use Parse::CPAN::Modlist;

# get the search pattern
my $pattern    = shift || die "You must pass a pattern\\n";
my $pattern_re = qr/$pattern/;

# munge our name
my $self       = $0; $self =~ s!^.*[\\\\/]!!;

# naughty user
die ("usage : $self <query>\\n") unless defined $pattern;

# get where the local modulelist is from CPAN(PLUS?)::Config
my $base;
eval { require CPANPLUS::Config; CPANPLUS::Config->import( ); };
unless ($@)
{
    my $conf = CPANPLUS::Config->new( );
    # different versions have the config in different places
    for (qw(conf _build))
    {
        $base = $conf->{$_}->{base} if exists $conf->{$_};
    }
}

goto SKIP if defined $base;

eval { require CPAN::Config; CPAN::Config->import( ) };

unless ($@)
{
    local $CPAN::Config;
    $base = $CPAN::Config->{'keep_source_where'}."/modules/";
}

goto SKIP if defined $base;

die "Couldn't find where you keep your CPAN Modlist\\n";

SKIP:
my $file     = "${base}/03modlist.data.gz";

# open the file and feed it to the mod list parser
my $fh       = IO::Zlib->new($file, "rb")  or die "Cannot open $file\\n";
my $ml       = Parse::CPAN::Modlist->new(join "", <$fh>);

# by default we want colour
my $colour   = 1;

# check to see if we have Term::ANSIColor installed
eval { require Term::ANSIColor };

# but if we can't have it then we can't have it
$colour      = 0 if $@;

# now do the actual checking

my $first    = 0;

# check each module
for my $module (map { $ml->module($_) } $ml->modules( ))
{
    my $name = $module->name( );
    my $desc = $module->description( );

    # check to see if the pattern matches the name or desc   
    next unless  $name =~ /$pattern_re/i or $desc =~ /$pattern_re/i;

    # aesthetics
    print "\\n-- Results for '$pattern' --\\n\\n" unless $first++;

    # check to see if it's installed
    eval  "require $name";   

    # print out the title - coloured if possible
    if ( $colour && !$@ )
    {
          print Term::ANSIColor::color('red'),
              "$name\\n",
              Term::ANSIColor::color('reset');
    }
    elsif (!$@)
    {
        print "!! $name\\n";
    }
    else
    {
        print "$name\\n";
    }

    # print out the name and description           
    print "- $desc\\n\\n";
}

exit 0;

First, the code tries to find the local module list. This can be in several places. It initially checks for CPANPLUS, assuming that anyone who has that installed will use it over the less featureful CPAN. Different versions of CPANPLUS store the file in different locations, so the code checks both.

If that fails, the program performs the same check for CPAN. If that doesn't work, the program ends.

If the file is present, the code uncompresses it with IO::Zlib and passes it to Parse::CPAN::Modlist to parse it.

The next part checks to see if Term::ANSIColor is available. If so, it can highlight installed modules.

The Parse::CPAN::Modlist::modules( ) method returns only the names of modules in the list, so the code must load the appropriate Module object to get at the other metadata. Using map { } in the for loop is incredibly convenient.

For efficency, there's an early check if the name or description matches the input pattern. Notice how the results banner (Results for '$pattern') only prints if there is at least one result.

The code attempts to require the module to see if it is available. If so, the program must highlight the name with color, if available, or exclamation marks otherwise. Finally, the program prints the description and tries the next module.

Hacking the Hack

There are plenty of ways to improve this program.

Currently it assumes that the CPANPLUS module list is the most up to date. It should probably check both CPANPLUS and CPAN if possible, look for the appropriate 03modlist.data.gz in each case, and push it onto a list of potential candidates before using the most recently modified version.

This hack also relies on 03modlist.data.gz being up to date. If you don't use the CPAN or CPANPLUS shell regularly, this might not be the case.

There are several possible solutions.

First, the program could just die if the module list is too old. This is the simplest (and most defeatist) solution.

Second, you could write a cron job that periodically updates the module list. This has the advantage that even if you have no network currently available, you know it's still reasonably fresh.

Finally, you could check to see whether the module list is older than a certain threshold. If so, you could warn or force the chosen provider to download a newer one. This has the disadvantage of not working if you cannot connect to your CPAN mirror.

Currently, the code checks both the name and the description—which can produce a lot of useless results. It should be possible to build a more complicated query parser that gives users finer-grained control over the results.

Finally, the code doesn't necessarily have to require modules to see if they exist. It could use logic similar to perldoc -l [Hack #2] to find their locations.

Get Perl Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.