O'Reilly logo

Perl & LWP by Sean M. Burke

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Rewrite for Features

My core approach in these cases is to pick some set of assumptions and stick with it, but also to assume that they will fail. So I write the code so that when it does fail, the point of failure will be easy to isolate. I do this is with debug levels, also called trace levels. Consider this expanded version of our code:

use strict;
use constant DEBUG => 0;

use HTML::TokeParser;
parse_fresh_stream(
  HTML::TokeParser->new('fresh1.html') || die($!),
  'http://freshair.npr.org/dayFA.cfm?todayDate=07%2F02%2F2001'
);

sub parse_fresh_stream {
  use URI;
  my($stream, $base_url) = @_;
  DEBUG and print "About to parse stream with base $base_url\n";

  while(my $a_tag = $stream->get_tag('a')) {
    DEBUG > 1 and printf "Considering {%s}\n", $a_tag->[3];
    my $url = URI->new_abs( ($a_tag->[1]{'href'} || next), $base_url);
    unless($url->scheme eq 'http') {
      DEBUG > 1 and print "Scheme is no good in $url\n";
      next;
    }
    unless($url->host =~ m/www\.npr\.org/) {
      DEBUG > 1 and print "Host is no good in $url\n";
      next;
    }
    unless($url->path =~ m{/ramfiles/.*\.ram$}) {
      DEBUG > 1 and print "Path is no good in $url\n";
      next;
    }
    DEBUG > 1 and print "IT'S GOOD!\n";
    my $text = $stream->get_trimmed_text('/a') || "??";
    printf "%s\n  %s\n", $text, $url;
  }
  DEBUG and print "End of stream\n";
  return;
}

Among the notable changes here, I'm making a URI object for each URL I'm scrutinizing, and to make a new absolute URI object out of each potentially relative URL, I have to pass the base URL as a parameter to the parse_fresh_stream( ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required