The xmllib Module
The xmlib
module provides a simple XML parser, using regular expressions to
pull the XML data apart, as shown in Example 5-1. The parser does basic checks on the
document, such as a check to see that there is only one top-level element
and a check to see that all tags are balanced.
You feed XML data to this parser piece by piece (as data arrives over a network, for example). The parser calls methods in itself for start tags, data sections, end tags, and entities, among other things.
If you’re only interested in a few tags, you can define special
start_tag
and end_tag
methods, where tag
is the tag name. The
start
functions are called with the attributes
given as a dictionary.
Example 5-1. Using the xmllib Module to Extract Information from an Element
File: xmllib-example-1.py
import xmllib
class Parser(xmllib.XMLParser):
# get quotation number
def _ _init_ _(self, file=None):
xmllib.XMLParser._ _init_ _(self)
if file:
self.load(file)
def load(self, file):
while 1:
s = file.read(512)
if not s:
break
self.feed(s)
self.close()
def start_quotation(self, attrs):
print "id =>", attrs.get("id")
raise EOFError
try:
c = Parser()
c.load(open("samples/sample.xml"))
except EOFError:
pass
id => 031
Example 5-2 contains a simple (and incomplete) rendering
engine. The parser maintains an element stack
(_ _tags
), which it passes to the renderer, together with text fragments. The renderer looks up the current tag hierarchy in a style dictionary, and if it isn’t already there, ...
Get Python Standard Library now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.