Gotcha!

As you get into the more advanced features of SAX, you certainly don’t reduce the number of problems you can get yourself into. However, these problems often become more subtle, which makes for some tricky bugs to track down. I’ll point out a few of these common problems.

Return Values from an EntityResolver

As I mentioned in the section on EntityResolvers, you should always ensure that you return null as a starting point for resolveEntity( ) method implementations. Luckily, Java ensures that you return something from the method, but I’ve often seen code like this:

    public InputSource resolveEntity(String publicID, String systemID)
        throws IOException, SAXException {

        InputSource inputSource = new InputSource( );

        // Handle references to online version of copyright.xml   
        if (systemID.equals(
            "http://www.newInstance.com/javaxml2/copyright.xml")) {
            inputSource.setSystemId(
                "file:///c:/javaxml2/ch04/xml/copyright.xml");
        }            
        
        // In the default case, return null
        return inputSource;    
    }

As you can see, an InputSource is created initially and then the system ID is set on that source. The problem here is that if no if blocks are entered, an InputSource with no system or public ID, as well as no specified Reader or InputStream, is returned. This can lead to unpredictable results; in some parsers, things continue with no problems. In other parsers, though, returning an empty InputSource results in entities being ignored, or in exceptions being thrown. In other words, return null at the end of every resolveEntity( ) implementation, and you won’t have to worry about these details.

DTDHandler and Validation

I’ve described setting properties and features in this chapter, their affect on validation, and also the DTDHandler interface. In all that discussion of DTDs and validation, it’s possible you got a few things mixed up; I want to be clear that the DTDHandler interface has nothing at all to do with validation. I’ve seen many developers register a DTDHandler and wonder why validation isn’t occurring. However, DTDHandler doesn’t do anything but provide notification of notation and unparsed entity declarations! Probably not what the developer expected. Remember that it’s a property that sets validation, not a handler instance:

reader.setFeature("http://xml.org/sax/features/validation", true);

Anything less than this (short of a parser validating by default) won’t get you validation, and probably won’t make you very happy.

Parsing on the Reader Instead of the Filter

I’ve talked about pipelines in SAX in this chapter, and hopefully you got an idea of how useful they could be. However, there’s an error I see among filter beginners time and time again, and it’s a frustrating one to deal with. The problem is setting up the pipeline chain incorrectly: this occurs when each filter does not set the preceding filter as its parent, ending in an XMLReader instance. Check out this code fragment:

    public void buildTree(DefaultTreeModel treeModel, 
                          DefaultMutableTreeNode base, String xmlURI) 
        throws IOException, SAXException {

        // Create instances needed for parsing
        XMLReader reader = 
            XMLReaderFactory.createXMLReader(vendorParserClass);        
        XMLWriter writer =
            new XMLWriter(reader, new FileWriter("snapshot.xml"));            
        NamespaceFilter filter = 
            new NamespaceFilter(reader, 
                "http://www.oreilly.com/javaxml2",
                "http://www.oreilly.com/catalog/javaxml2");
        ContentHandler jTreeContentHandler = 
            new JTreeContentHandler(treeModel, base, reader);
        ErrorHandler jTreeErrorHandler = new JTreeErrorHandler( );

        // Register content handler
        reader.setContentHandler(jTreeContentHandler);

        // Register error handler
        reader.setErrorHandler(jTreeErrorHandler);
            
        // Register entity resolver
        reader.setEntityResolver(new SimpleEntityResolver( ));

        // Parse
        InputSource inputSource = 
            new InputSource(xmlURI);
        reader.parse(inputSource);        
    }

See anything wrong? Parsing is occurring on the XMLReader instance, not at the end of the pipeline chain. In addition, the NamespaceFilter instance sets its parent to the XMLReader, instead of the XMLWriter instance that should precede it in the chain. These errors are not obvious, and will throw your intended pipeline into chaos. In this example, no filtering will occur at all, because parsing occurs on the reader, not the filters. If you correct that error, you still won’t get output, as the writer is left out of the pipeline through improper setting of the NamespaceFilter’s parent. Setting the parent properly sets you up, though, and you’ll finally get the behavior you expected in the first place. Be very careful with parentage and parsing when handling SAX pipelines.

Get Java and XML, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.