The HTMLParser Module
Module HTMLParser
supplies one class, HTMLParser
, that you subclass
to override and add methods. HTMLParser.HTMLParser
is similar to sgmllib.SGMLParser
, but is simpler
and able to parse XHTML as well. The main differences between
HTMLParser
and SGMLParser
are
the following:
HMTLParser
does not call back to methods nameddo_
tag
,start_
tag
, andend_
tag
. To process tags and end tags, your subclassX
ofHTMLParser
must override methodshandle_starttag
and/orhandle_endtag
and check explicitly for the tags it wants to process.HMTLParser
does not keep track of, nor check, tag nesting in any way.HMTLParser
does nothing, by default, to resolve character and entity references. Your subclassX
ofHTMLParser
must override methodshandle_charref
and/orhandle_entityref
if it needs to perform processing of such references.
The most frequently used methods of an instance
h
of a subclass
X
of HTMLParser
are as
follows.
Get Python in a Nutshell now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.