The xmllib Module

The xmlib module provides a simple XML parser, using regular expressions to pull the XML data apart, as shown in Example 5-1. The parser does basic checks on the document, such as a check to see that there is only one top-level element and a check to see that all tags are balanced.

You feed XML data to this parser piece by piece (as data arrives over a network, for example). The parser calls methods in itself for start tags, data sections, end tags, and entities, among other things.

If you’re only interested in a few tags, you can define special start_tag and end_tag methods, where tag is the tag name. The start functions are called with the attributes given as a dictionary.

Example 5-1. Using the xmllib Module to Extract Information from an Element

File: xmllib-example-1.py

import xmllib

class Parser(xmllib.XMLParser):
    # get quotation number

    def _ _init_ _(self, file=None):
        xmllib.XMLParser._ _init_ _(self)
        if file:
            self.load(file)

    def load(self, file):
        while 1:
            s = file.read(512)
            if not s:
                break
            self.feed(s)
        self.close()

    def start_quotation(self, attrs):
        print "id =>", attrs.get("id")
        raise EOFError

try:
    c = Parser()
    c.load(open("samples/sample.xml"))
except EOFError:
    pass

id => 031

Example 5-2 contains a simple (and incomplete) rendering engine. The parser maintains an element stack (_ _tags), which it passes to the renderer, together with text fragments. The renderer looks up the current tag hierarchy in a style dictionary, and if it isn’t already there, ...

Get Python Standard Library now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.