As you get into the more advanced features of SAX, you certainly don’t reduce the number of problems you can get yourself into. However, these problems often become more subtle, which makes for some tricky bugs to track down. I’ll point out a few of these common problems.
As I mentioned in the section on EntityResolver
s,
you should always ensure that you return null
as a
starting point for resolveEntity( )
method implementations. Luckily, Java
ensures that you return something from the method, but I’ve
often seen code like this:
public InputSource resolveEntity(String publicID, String systemID) throws IOException, SAXException { InputSource inputSource = new InputSource( ); // Handle references to online version of copyright.xml if (systemID.equals( "http://www.newInstance.com/javaxml2/copyright.xml")) { inputSource.setSystemId( "file:///c:/javaxml2/ch04/xml/copyright.xml"); } // In the default case, return null return inputSource; }
As you can see, an InputSource
is created
initially and then the system ID is set on that source. The problem
here is that if no if
blocks are entered, an
InputSource
with no system or public ID, as well
as no specified Reader
or
InputStream
, is returned. This can lead to
unpredictable results; in some parsers, things continue with no
problems. In other parsers, though, returning an empty
InputSource
results in entities being ignored, or
in exceptions being thrown. In other words, return
null
at the end of every resolveEntity( )
implementation, and you won’t have to worry about
these details.
I’ve described setting properties and features in this chapter,
their affect on validation, and also the
DTDHandler
interface. In all that discussion of DTDs and validation, it’s
possible you got a few things mixed up; I want to be clear that the
DTDHandler
interface has nothing at all to do with
validation. I’ve seen many developers register a
DTDHandler
and wonder why validation isn’t
occurring. However, DTDHandler
doesn’t do
anything but provide notification of notation and unparsed entity
declarations! Probably not what the developer expected. Remember that
it’s a property that sets validation, not
a handler instance:
reader.setFeature("http://xml.org/sax/features/validation", true);
Anything less than this (short of a parser validating by default) won’t get you validation, and probably won’t make you very happy.
I’ve talked about
pipelines in SAX
in this chapter, and hopefully you got an idea of how useful they
could be. However, there’s an error I see among filter
beginners time and time again, and it’s a frustrating one to
deal with. The problem is setting up the pipeline chain incorrectly:
this occurs when each filter does not set the preceding filter as its
parent, ending in an XMLReader
instance. Check out
this code fragment:
public void buildTree(DefaultTreeModel treeModel, DefaultMutableTreeNode base, String xmlURI) throws IOException, SAXException { // Create instances needed for parsing XMLReader reader = XMLReaderFactory.createXMLReader(vendorParserClass); XMLWriter writer = new XMLWriter(reader, new FileWriter("snapshot.xml")); NamespaceFilter filter = new NamespaceFilter(reader
, "http://www.oreilly.com/javaxml2", "http://www.oreilly.com/catalog/javaxml2"); ContentHandler jTreeContentHandler = new JTreeContentHandler(treeModel, base, reader); ErrorHandler jTreeErrorHandler = new JTreeErrorHandler( ); // Register content handlerreader
.setContentHandler(jTreeContentHandler); // Register error handlerreader
.setErrorHandler(jTreeErrorHandler); // Register entity resolverreader
.setEntityResolver(new SimpleEntityResolver( )); // Parse InputSource inputSource = new InputSource(xmlURI);reader
.parse(inputSource); }
See anything wrong? Parsing is occurring on the
XMLReader
instance, not at the end of the pipeline
chain. In addition, the NamespaceFilter
instance
sets its parent to the XMLReader
, instead of the
XMLWriter
instance that should precede it in the
chain. These errors are not obvious, and will throw your intended
pipeline into chaos. In this example, no filtering will occur at all,
because parsing occurs on the reader, not the filters. If you correct
that error, you still won’t get output, as the writer is left
out of the pipeline through improper setting of the
NamespaceFilter
’s parent. Setting the parent
properly sets you up, though, and you’ll finally get the
behavior you expected in the first place. Be very careful with
parentage and parsing when handling SAX pipelines.
Get Java and XML, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.