Controlling XSLT Stylesheet Loading

Credit: Jürgen Hermann

Problem

You need to process XML documents and access external documents (e.g., stylesheets), but you can’t use filesystem paths (to keep documents portable) or Internet-accessible URLs (for performance and security).

Solution

4Suite’s xml.xslt package (http://www.4suite.org/) gives you all the power you need to handle XML stylesheets, including the hooks for sophisticated needs such as those met by this recipe:

# uses 4Suite Version 0.10.2 or later
from xml.xslt.Processor import Processor
from xml.xslt.StylesheetReader import StylesheetReader

class StylesheetFromDict(StylesheetReader):
    "A stylesheet reader that loads XSLT stylesheets from a python dictionary"

    def _ _init_ _(self, styles, *args):
        "Remember the dict we want to load the stylesheets from"
        StylesheetReader._ _init_ _(self, *args)
        self.styles = styles
        self._ _myargs = args

    def _ _getinitargs_ _(self):
        "Return init args for clone(  )"
        return (self.styles,) + self._ _myargs

    def fromUri(self, uri, baseUri='', ownerDoc=None, stripElements=None):
        "Load stylesheet from a dict"
        parts = uri.split(':', 1)
        if parts[0] == 'internal' and self.styles.has_key(parts[1]):
            # Load the stylesheet from the internal repository (your dictionary)
            return StylesheetReader.fromString(self, self.styles[parts[1]],
                baseUri, ownerDoc, stripElements)
        else:
            # Revert to normal behavior
            return StylesheetReader.fromUri(self, uri,
                baseUri, ownerDoc, stripElements)

if _ _name_ _ == "_ _main_ _":
    # test and example of this stylesheet's loading approach

    # the sample stylesheet repository
    internal_stylesheets = {
        'second-author.xsl': """
            <person xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xsl:version="1.0">
            <xsl:value-of select="books/book/author[2]"/>
            </person>
        """
    }

    # the sample document, referring to an "internal" stylesheet
    xmldoc = """
      <?xml-stylesheet href="internal:second-author.xsl"
          type="text/xml"?>
      <books>
        <book title="Python Essential Reference">
          <author>David M. Beazley</author>
          <author>Guido van Rossum</author>
        </book>
      </books>
    """

    # Create XSLT processor and run it
    processor = Processor(  )
    processor.setStylesheetReader(StylesheetFromDict(internal_stylesheets))
    print processor.runString(xmldoc)

Discussion

If you get a lot of XML documents from third parties (via FTP, HTTP, or other means), problems could arise because the documents were created in their environments, and now you must process them in your environment. If a document refers to external files (such as stylesheets) in the filesystem of the remote host, these paths often do not make sense on your local host. One common solution is to refer to external documents through public URLs accessible via the Internet, but this, of course, incurs substantial overhead (you need to fetch the stylesheet from the remote server) and poses some risks. (What if the remote server is down? What about privacy and security?)

Another approach is to use private URL schemes, such as stylesheet:layout.xsl. These need to be resolved to real, existing URLs, which this recipe’s code does for XSLT processing. We show how to use a hook offered by 4Suite, a Python XSLT engine, to refer to stylesheets in an XML-Stylesheet processing instruction (see http://www.w3.org/TR/xml-stylesheet/).

A completely analogous approach can be used to load the stylesheet from a database or return a locally cached stylesheet previously fetched from a remote URL. The essence of this recipe is that you can subclass StylesheetReader and customize the fromUri method to perform whatever resolution of private URL schemes you require. The recipe specifically looks at the URL’s protocol. If it’s internal: followed by a name that is a known key in an internal dictionary that maps names to stylesheets, it returns the stylesheet by delegating the parsing of the dictionary entry’s value to the fromString method of StylesheetReader. In all other cases, it leaves the URI alone and delegates to the parent class’s method.

The output of the test code is:

<?xml version='1.0' encoding='UTF-8'?>
<person>Guido van Rossum</person>

This recipe requires at least Python 2.0 and 4Suite Version 0.10.2.

See Also

The XML-Stylesheet processing instruction is described in a W3C recommendation (http://www.w3.org/TR/xml-stylesheet/); the 4Suite tools from FourThought are available at http://www.4suite.org/.

Get Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.