O'Reilly logo

Python Standard Library by Fredrik Lundh

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The sgmllib Module

The sgmllib module, shown in Example 5-5, provides a basic SGML parser. It works pretty much the same as the xmllib parser, but is less restrictive (and less complete).

Like in xmllib, this parser calls methods in itself to deal with things like start tags, data sections, end tags, and entities. If you’re only interested in a few tags, you can define special start and end methods.

Example 5-5. Using the sgmllib Module to Extract the Title Element

File: sgmllib-example-1.py

import sgmllib
import string

class FoundTitle(Exception):
    pass

class ExtractTitle(sgmllib.SGMLParser):

    def _ _init_ _(self, verbose=0):
        sgmllib.SGMLParser._ _init_ _(self, verbose)
        self.title = self.data = None

    def handle_data(self, data):
        if self.data is not None:
            self.data.append(data)

    def start_title(self, attrs):
        self.data = []

    def end_title(self):
        self.title = string.join(self.data, "")
        raise FoundTitle # abort parsing!

def extract(file):
    # extract title from an HTML/SGML stream
    p = ExtractTitle()
    try:
        while 1:
            # read small chunks
            s = file.read(512)
            if not s:
                break
            p.feed(s)
        p.close()
    except FoundTitle:
        return p.title
    return None

#
# try it out

print "html", "=>", extract(open("samples/sample.htm"))
print "sgml", "=>", extract(open("samples/sample.sgm"))

html => A Title.
sgml => Quotations

To handle all tags, overload the unknown_starttag and unknown_endtag methods instead, as Example 5-6 demonstrates.

Example 5-6. Using the sgmllib Module to Format an SGML Document

File: sgmllib-example-2.py ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required