O'Reilly logo

Perl & LWP by Sean M. Burke

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Basic HTML::TokeParser Use

The HTML::TokeParser module is a class for accessing HTML as tokens. An HTML::TokeParser object gives you one token at a time, much as a filehandle gives you one line at a time from a file. The HTML can be tokenized from a file or string. The tokenizer decodes entities in attributes, but not entities in text.

Create a token stream object using one of these two constructors:

my $stream = HTML::TokeParser->new($filename)
  || die "Couldn't read HTML file $filename: $!";

or:

my $stream = HTML::TokeParser->new( \$string_of_html );

Once you have that stream object, you get the next token by calling:

my $token = $stream->get_token(  );

The $token variable then holds an array reference, or undef if there's nothing left in the stream's file or string. This code processes every token in a document:

my $stream = HTML::TokeParser->new($filename)
  || die "Couldn't read HTML file $filename: $!";

while(my $token = $stream->get_token) {
  # ... consider $token ...
}

The $token can have one of six kinds of values, distinguished first by the value of $token->[0], as shown in Table 7-1.

Table 7-1. Token types

Token

Values

Start-tag

["S",  $tag, $attribute_hashref, $attribute_order_arrayref, $source]

End-tag

["E",  $tag, $source]

Text

["T",  $text, $should_not_decode]

Comment

["C",  $source]

Declaration

["D",  $source]

Processing instruction

["PI", $content, $source]

Start-Tag Tokens

If $token->[0] is "S", the token represents a start-tag:

["S",  $tag, $attribute_hash, $attribute_order_arrayref, $source]

The ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required