The htmllib Module
The htmllib
module
supplies a class named HTMLParser
that subclasses
SGMLParser
and defines
start_
tag
,
do_
tag
, and
end_
tag
methods for
tags defined in HTML 2.0. HTMLParser
implements
and overrides methods in terms of calls to methods of a formatter
object, covered later in this chapter. You can subclass
HTMLParser
to add or override methods. In addition
to the start_
tag
,
do_
tag
, and
end_
tag
methods, an
instance h
of
HTMLParser
supplies the following attributes and
methods.
Reference Section
Reference Section
Reference Section
Reference Section
Reference Section
Reference Section
Reference Section
Reference Section
The formatter Module
The formatter
module
defines formatter and writer classes. You instantiate a formatter by
passing to the class a writer instance, and then you pass the
formatter instance to class HTMLParser
of module
htmllib
. You can define your own formatters and
writers by subclassing
formatter
’s classes and
overriding methods appropriately, but I do not cover this advanced
and rarely used possibility in this book. An application with special
output requirements would typically define an appropriate writer,
subclassing AbstractWriter
and overriding all
methods, and use class AbstractFormatter
without
needing to subclass it. Module formatter
supplies
the following classes.
The htmlentitydefs Module
The htmlentitydefs
module supplies just one attribute, a dictionary named
entitydefs
that maps each entity defined in HTML 2.0 to the corresponding ...
Get Python in a Nutshell now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.