O'Reilly logo

Perl & LWP by Sean M. Burke

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

More HTML::TokeParser Methods

Example 7-1 illustrates that often you aren't interested in every kind of token in a stream, but care only about tokens of a certain kind. The HTML::TokeParser interface supports this with three methods, get_tag( ), get_text( ), and get_trimmed_text( ) that do something other than simply get the next token.

$text_string = $stream->get_text( );

If the next token is text, return its value.

$text_string = $stream->get_text('foo');

Return all text up to the next foo start-tag.

$text_string = $stream->get_text('/bar');

Return all text up to the next /bar end-tag.

$text = $stream->get_trimmed_text( );$text = $stream->get_trimmed_text('foo');$text = $stream->get_trimmed_text('/bar');

Like get_text( ) calls, except with initial and final whitespace removed, and all other whitespace collapsed.

$tag_ref = $stream->get_tag( );

Return the next start-tag or end-tag token.

$tag_ref = $stream->get_tag('foo', '/bar', 'baz');

Return the next foo start-tag, /bar end-tag, or baz start-tag.

We will explain these methods in detail in the following sections.

The get_text( ) Method

The get_text( ) syntax is:

$text_string = $stream->get_text(  );

If $stream's next token is text, this gets it, resolves any entities in it, and returns its string value. Otherwise, this returns an empty string.

For example, if you are parsing this snippet:

<h1 lang='en-GB'>Shatner Reprises Kirk R&ocirc;le</h1>

and have just parsed the token for h1, $stream->get_text( ) returns "Shatner Reprises Kirk Rôle." ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required