Cover image for SAX2

Book description

This concise book gives you the information you need to effectively use the Simple API for XML (SAX2), the dominant API for efficient XML processing with Java. With the SAX2 API, developers have access to the information in XML documents as they are read, without imposing major memory constraints or a large code footprint. SAX2 is often used by other APIs "under the covers", and provides a foundation for processing and creating both XML and non-XML information. While generally considered the most efficient approach to handling XML document parsing, SAX2 also carries a significant learning curve. In SAX2, author David Brownell explores the many details of managing XML parsers, filtering the information those parsers return, generating your own SAX2 events to convert non-XML information to an XML form, and developing strategies for using event-based parsing in a variety of application scenarios. Created in a public process by the XML-Dev mailing list, the SAX2 API is compact and highly functional. SAX2 uses callbacks to report the information in an XML document as the document is read, allowing you to create your own program structures around the content of documents. No intermediary model of an entire XML document is necessary, and the mapping from XML structures to Java structures and back is straightforward. Both developers learning about SAX2 for the first time and developers returning for reference and advanced material about SAX2 will find useful information in this book. Chapters provide detailed explanations and examples of many different aspects of SAX2 development, while appendices provide a reference to the API and an explanation of the relationships between the SAX2 API and the XML Information Set. While the core of the API is quite approachable, many of its more advanced features are both obscure and powerful. You can use SAX2 to filter, modify, and restructure information in layers of processing which make it easy to reuse generic tools. SAX2 also has some significant limitations that applications need to address in their own ways. This new book gives you the detail and examples required to use SAX2 to its full potential, taking advantage of its power while avoiding its limitations.

Table of Contents

  1. SAX2
    1. Preface
      1. Who Should Read This Book?
      2. Organization of This Book
      3. Conventions Used in This Book
      4. How to Contact Us
      5. Acknowledgments
    2. 1. The Simple API for XML
      1. Types of XML APIs
      2. Why Choose SAX?
        1. Stream-Based Processing
        2. Data Structure Flexibility
        3. Memory Consumption with SAX and DOM
        4. Other Reasons to Prefer SAX
      3. Why Not to Choose SAX?
      4. A Short History of SAX
        1. SAX1
        2. SAX2
        3. SAX2 Extensions
        4. Is SAX2 a “Standard”?
        5. Sun’s Java API for XML Processing (JAXP)
      5. Packages in the SAX2 API
      6. Some Popular SAX2 Parser Distributions
        1. Ælfred2
        2. Crimson
        3. Xerces
      7. Installing a SAX2 Parser
      8. What XML Are We Talking About?
    3. 2. Introducing SAX2
      1. Producers and Consumers
      2. Beginning SAX
        1. How Do the Parts Fit Together?
        2. What Are the SAX2 Event Handlers?
        3. XMLWriter: an Event Consumer
          1. Event pipelines
          2. Concerns when writing XML text
      3. Basic ContentHandler Events
        1. The DefaultHandler Class
        2. Example: Elements and Text
        3. The Attributes Interface
          1. Attribute lookup by name
          2. Attribute lookup by index
          3. Other attributes issues
        4. Essential ContentHandler Callbacks
      4. Producer-Side Validation
        1. SAX2 Feature Flags
        2. Handling Validity Errors
      5. Exception Handling
        1. SAX2 Exception Classes
        2. ErrorHandler Interface
        3. Errors and Diagnostics
      6. Namespaces and SAX2
        1. What Namespaces Do to XML
        2. Element and Attribute Naming with Namespaces
          1. Element naming
          2. Attribute naming
          3. Things to keep in mind
        3. Namespace Feature Flags
        4. ContentHandler and Prefix Mappings
    4. 3. Producing SAX2 Events
      1. Pull Mode Event Production with XMLReader
        1. The XMLReader Interface
        2. The InputSource Class
          1. Always provide absolute URIs
          2. Providing entity text
        3. Filenames Versus URIs
      2. Bootstrapping an XMLReader
        1. The XMLReaderFactory Class
        2. Calling Parser Constructors
        3. Using JAXP
      3. Configuring XMLReader Behavior
        1. XMLReader Properties
        2. XMLReader Feature Flags
      4. The EntityResolver Interface
      5. Other Kinds of SAX2 Event Producers
        1. DOM-to-SAX Event Production (and DOM4J, JDOM)
          1. Turning DOM trees into SAX events
          2. Turning DOM4J trees into SAX events
          3. Turning JDOM trees into SAX events
        2. Push Mode Event Production
          1. Turning CSV files into SAX events
          2. Turning objects into SAX events
          3. Data modeling concerns
        3. Producing Well-Formed Event Streams
        4. The XMLFilter Interface
    5. 4. Consuming SAX2 Events
      1. More About ContentHandler
        1. Other ContentHandler Methods
        2. The Locator Interface
        3. Internationalization Concerns
      2. The LexicalHandler Interface
      3. Exposing DTD Information
        1. The DeclHandler Interface
        2. The DTDHandler Interface
      4. Turning SAX Events into Data Structures
        1. SAX-to-DOM Consumers
        2. Pruning Noise Data from a DOM Tree
        3. Building a Partial DOM
        4. Turning SAX Events into Custom Data Structures
      5. XML Pipelines
        1. The XMLFilterImpl Class
        2. XMLFilter Examples
        3. The javax.xml.transform.sax Package
          1. SAX in Push-Mode with XSLT
          2. SAX in Pull-Mode with XSLT
        4. The gnu.xml.pipeline Framework
    6. 5. Other SAX Classes
      1. Helper Classes
        1. The AttributesImpl Class
        2. The LocatorImpl Class
        3. The NamespaceSupport Class
      2. SAX1 Support
    7. 6. Putting It All Together
      1. Rich Site Summary: RSS
        1. Data Model for RSS Classic
        2. Consuming and Producing RSS Parsing Events
        3. Building Applications with RSS
      2. XML and Messaging
        1. XML/Internet Versus Older Technologies
        2. Roles for Java in XML Messaging
        3. XML Messaging over HTTP with SAX2
      3. Including Subdocuments
    8. A. SAX2 API Summary
      1. The org.xml.sax Package
        1. The AttributeList Interface
        2. The Attributes Interface
        3. The ContentHandler Interface
        4. The DocumentHandler Interface
        5. The DTDHandler Interface
        6. The EntityResolver Interface
        7. The ErrorHandler Interface
        8. The HandlerBase Interface
        9. The InputSource Class
        10. The Locator Interface
        11. The Parser Interface
        12. SAXException
        13. SAXNotRecognizedException
        14. SAXNotSupportedException
        15. SAXParseException
        16. The XMLFilter Interface
        17. The XMLReader Interface
      2. The org.xml.sax.helpers Package
        1. The AttributeListImpl Interface
        2. The AttributesImpl Class
        3. The DefaultHandler Class
        4. The LocatorImpl Class
        5. The NamespaceSupport Class
        6. The ParserAdapter Class
        7. The ParserFactory Class
        8. The XMLFilterImpl Class
        9. The XMLReaderAdapter Class
        10. The XMLReaderFactory Class
      3. The org.xml.sax.ext Package
        1. The DeclHandler class
        2. The LexicalHandler Interface
    9. B. SAX2 and the XML Infoset
      1. Event Producer Issues
      2. Event Consumer Issues
        1. Structural Issues
        2. Base URIs, xml:base, and Locator Data
      3. Document Information Item
      4. Element Information Items
      5. Attribute Information Items
      6. Processing Instruction Information Items
      7. Unexpanded Entity Reference Information Items
      8. Character Information Items
      9. Comment Information Items
      10. Document Type Declaration Information Item
      11. Unparsed Entity Information Items
      12. Notation Information Items
      13. Namespace Information Items
    10. Index
    11. Colophon