You are previewing Learning Java, 4th Edition.

Learning Java, 4th Edition

Cover of Learning Java, 4th Edition by Daniel Leuck... Published by O'Reilly Media, Inc.
  1. Learning Java
  2. Preface
    1. Who Should Read This Book
    2. New Developments
      1. New in This Edition (Java 6 and 7)
    3. Using This Book
    4. Online Resources
    5. Conventions Used in This Book
    6. Using Code Examples
    7. Safari® Books Online
    8. How to Contact Us
    9. Acknowledgments
  3. 1. A Modern Language
    1. Enter Java
      1. Java’s Origins
      2. Growing Up
    2. A Virtual Machine
    3. Java Compared with Other Languages
    4. Safety of Design
      1. Simplify, Simplify, Simplify...
      2. Type Safety and Method Binding
      3. Incremental Development
      4. Dynamic Memory Management
      5. Error Handling
      6. Threads
      7. Scalability
    5. Safety of Implementation
      1. The Verifier
      2. Class Loaders
      3. Security Managers
    6. Application and User-Level Security
    7. A Java Road Map
      1. The Past: Java 1.0–Java 1.6
      2. The Present: Java 7
      3. The Future
      4. Availability
  4. 2. A First Application
    1. Java Tools and Environment
    2. Configuring Eclipse and Creating a Project
      1. Importing the Learning Java Examples
    3. HelloJava
      1. Classes
      2. The main() Method
      3. Classes and Objects
      4. Variables and Class Types
      5. HelloComponent
      6. Inheritance
      7. The JComponent Class
      8. Relationships and Finger Pointing
      9. Package and Imports
      10. The paintComponent() Method
    4. HelloJava2: The Sequel
      1. Instance Variables
      2. Constructors
      3. Events
      4. The repaint() Method
      5. Interfaces
    5. HelloJava3: The Button Strikes!
      1. Method Overloading
      2. Components
      3. Containers
      4. Layout
      5. Subclassing and Subtypes
      6. More Events and Interfaces
      7. Color Commentary
      8. Static Members
      9. Arrays
      10. Our Color Methods
    6. HelloJava4: Netscape’s Revenge
      1. Threads
      2. The Thread Class
      3. The Runnable Interface
      4. Starting the Thread
      5. Running Code in the Thread
      6. Exceptions
      7. Synchronization
  5. 3. Tools of the Trade
    1. JDK Environment
    2. The Java VM
    3. Running Java Applications
      1. System Properties
    4. The Classpath
      1. javap
    5. The Java Compiler
    6. JAR Files
      1. File Compression
      2. The jar Utility
      3. The pack200 Utility
    7. Policy Files
      1. The Default Security Manager
      2. The policytool Utility
      3. Using a Policy File with the Default Security Manager
  6. 4. The Java Language
    1. Text Encoding
      1. Javadoc Comments
    3. Types
      1. Primitive Types
      2. Reference Types
      3. A Word About Strings
    4. Statements and Expressions
      1. Statements
      2. Expressions
    5. Exceptions
      1. Exceptions and Error Classes
      2. Exception Handling
      3. Bubbling Up
      4. Stack Traces
      5. Checked and Unchecked Exceptions
      6. Throwing Exceptions
      7. try Creep
      8. The finally Clause
      9. Try with Resources
      10. Performance Issues
    6. Assertions
      1. Enabling and Disabling Assertions
      2. Using Assertions
    7. Arrays
      1. Array Types
      2. Array Creation and Initialization
      3. Using Arrays
      4. Anonymous Arrays
      5. Multidimensional Arrays
      6. Inside Arrays
  7. 5. Objects in Java
    1. Classes
      1. Accessing Fields and Methods
      2. Static Members
    2. Methods
      1. Local Variables
      2. Shadowing
      3. Static Methods
      4. Initializing Local Variables
      5. Argument Passing and References
      6. Wrappers for Primitive Types
      7. Autoboxing and Unboxing of Primitives
      8. Variable-Length Argument Lists
      9. Method Overloading
    3. Object Creation
      1. Constructors
      2. Working with Overloaded Constructors
      3. Static and Nonstatic Initializer Blocks
    4. Object Destruction
      1. Garbage Collection
      2. Finalization
      3. Weak and Soft References
    5. Enumerations
      1. Enum Values
      2. Customizing Enumerations
  8. 6. Relationships Among Classes
    1. Subclassing and Inheritance
      1. Shadowed Variables
      2. Overriding Methods
      3. Special References: this and super
      4. Casting
      5. Using Superclass Constructors
      6. Full Disclosure: Constructors and Initialization
      7. Abstract Methods and Classes
    2. Interfaces
      1. Interfaces as Callbacks
      2. Interface Variables
      3. Subinterfaces
    3. Packages and Compilation Units
      1. Compilation Units
      2. Package Names
      3. Class Visibility
      4. Importing Classes
    4. Visibility of Variables and Methods
      1. Basic Access Modifiers
      2. Subclasses and Visibility
      3. Interfaces and Visibility
    5. Arrays and the Class Hierarchy
      1. ArrayStoreException
    6. Inner Classes
      1. Inner Classes as Adapters
      2. Inner Classes Within Methods
  9. 7. Working with Objects and Classes
    1. The Object Class
      1. Equality and Equivalence
      2. Hashcodes
      3. Cloning Objects
    2. The Class Class
    3. Reflection
      1. Modifiers and Security
      2. Accessing Fields
      3. Accessing Methods
      4. Accessing Constructors
      5. What About Arrays?
      6. Accessing Generic Type Information
      7. Accessing Annotation Data
      8. Dynamic Interface Adapters
      9. What Is Reflection Good For?
    4. Annotations
      1. Using Annotations
      2. Standard Annotations
      3. The apt Tool
  10. 8. Generics
    1. Containers: Building a Better Mousetrap
      1. Can Containers Be Fixed?
    2. Enter Generics
      1. Talking About Types
    3. “There Is No Spoon”
      1. Erasure
      2. Raw Types
    4. Parameterized Type Relationships
      1. Why Isn’t a List<Date> a List<Object>?
    5. Casts
    6. Writing Generic Classes
      1. The Type Variable
      2. Subclassing Generics
      3. Exceptions and Generics
      4. Parameter Type Limitations
    7. Bounds
      1. Erasure and Bounds (Working with Legacy Code)
    8. Wildcards
      1. A Supertype of All Instantiations
      2. Bounded Wildcards
      3. Thinking Outside the Container
      4. Lower Bounds
      5. Reading, Writing, and Arithmetic
      6. <?>, <Object>, and the Raw Type
      7. Wildcard Type Relationships
    9. Generic Methods
      1. Generic Methods Introduced
      2. Type Inference from Arguments
      3. Type Inference from Assignment Context
      4. Explicit Type Invocation
      5. Wildcard Capture
      6. Wildcard Types Versus Generic Methods
    10. Arrays of Parameterized Types
      1. Using Array Types
      2. What Good Are Arrays of Generic Types?
      3. Wildcards in Array Types
    11. Case Study: The Enum Class
    12. Case Study: The sort() Method
    13. Conclusion
  11. 9. Threads
    1. Introducing Threads
      1. The Thread Class and the Runnable Interface
      2. Controlling Threads
      3. Death of a Thread
    2. Threading an Applet
      1. Issues Lurking
    3. Synchronization
      1. Serializing Access to Methods
      2. Accessing class and instance Variables from Multiple Threads
      3. The wait() and notify() Methods
      4. Passing Messages
      5. ThreadLocal Objects
    4. Scheduling and Priority
      1. Thread State
      2. Time-Slicing
      3. Priorities
      4. Yielding
    5. Thread Groups
      1. Working with ThreadGroups
      2. Uncaught Exceptions
    6. Thread Performance
      1. The Cost of Synchronization
      2. Thread Resource Consumption
    7. Concurrency Utilities
      1. Executors
      2. Locks
      3. Synchronization Constructs
      4. Atomic Operations
    8. Conclusion
  12. 10. Working with Text
    1. Text-Related APIs
    2. Strings
      1. Constructing Strings
      2. Strings from Things
      3. Comparing Strings
      4. Searching
      5. Editing
      6. String Method Summary
      7. StringBuilder and StringBuffer
    3. Internationalization
      1. The java.util.Locale Class
      2. Resource Bundles
    4. Parsing and Formatting Text
      1. Parsing Primitive Numbers
      2. Tokenizing Text
    5. Printf-Style Formatting
      1. Formatter
      2. The Format String
      3. String Conversions
      4. Primitive and Numeric Conversions
      5. Flags
      6. Miscellaneous
    6. Formatting with the java.text Package
      1. MessageFormat
    7. Regular Expressions
      1. Regex Notation
      2. The java.util.regex API
  13. 11. Core Utilities
    1. Math Utilities
      1. The java.lang.Math Class
      2. Big/Precise Numbers
      3. Floating-Point Components
      4. Random Numbers
    2. Dates and Times
      1. Working with Calendars
      2. Time Zones
      3. Parsing and Formatting with DateFormat
      4. Printf-Style Date and Time Formatting
    3. Timers
    4. Collections
      1. The Collection Interface
      2. Iterator
      3. Collection Types
      4. The Map Interface
      5. Collection Implementations
      6. Hash Codes and Key Values
      7. Synchronized and Unsynchronized Collections
      8. Read-Only and Read-Mostly Collections
      9. WeakHashMap
      10. EnumSet and EnumMap
      11. Sorting Collections
      12. A Thrilling Example
    5. Properties
      1. Loading and Storing
      2. System Properties
    6. The Preferences API
      1. Preferences for Classes
      2. Preferences Storage
      3. Change Notification
    7. The Logging API
      1. Overview
      2. Logging Levels
      3. A Simple Example
      4. Logging Setup Properties
      5. The Logger
      6. Performance
    8. Observers and Observables
  14. 12. Input/Output Facilities
    1. Streams
      1. Basic I/O
      2. Character Streams
      3. Stream Wrappers
      4. Pipes
      5. Streams from Strings and Back
      6. Implementing a Filter Stream
    2. File I/O
      1. The Class
      2. File Streams
      3. RandomAccessFile
      4. Resource Paths
    3. The NIO File API
      1. FileSystem and Path
      2. NIO File Operations
      3. Directory Operations
      4. Watching Paths
    4. Serialization
      1. Initialization with readObject()
      2. SerialVersionUID
    5. Data Compression
      1. Archives and Compressed Data
      2. Decompressing Data
      3. Zip Archive As a Filesystem
    6. The NIO Package
      1. Asynchronous I/O
      2. Performance
      3. Mapped and Locked Files
      4. Channels
      5. Buffers
      6. Character Encoders and Decoders
      7. FileChannel
      8. Scalable I/O with NIO
  15. 13. Network Programming
    1. Sockets
      1. Clients and Servers
      2. author="pat” timestamp="20120926T110720-0500” comment="one of those sections I hate to get rid of but is less relevant in terms of the example... should probably find a more modern example...”The DateAtHost Client
      3. The TinyHttpd Server
      4. Socket Options
      5. Proxies and Firewalls
    2. Datagram Sockets
      1. author="pat” timestamp="20120926T141346-0500” comment="I actually rewrote this as a standalone client but then decided to leave it as an applet”The HeartBeat Applet
      2. InetAddress
    3. Simple Serialized Object Protocols
      1. A Simple Object-Based Server
    4. Remote Method Invocation
      1. Real-World Usage
      2. Remote and Nonremote Objects
      3. An RMI Example
      4. RMI and CORBA
    5. Scalable I/O with NIO
      1. Selectable Channels
      2. Using Select
      3. LargerHttpd
      4. Nonblocking Client-Side Operations
  16. 14. Programming for the Web
    1. Uniform Resource Locators (URLs)
    2. The URL Class
      1. Stream Data
      2. Getting the Content as an Object
      3. Managing Connections
      4. Handlers in Practice
      5. Useful Handler Frameworks
    3. Talking to Web Applications
      1. Using the GET Method
      2. Using the POST Method
      3. The HttpURLConnection
      4. SSL and Secure Web Communications
      5. URLs, URNs, and URIs
    4. Web Services
      1. XML-RPC
      2. WSDL
      3. The Tools
      4. The Weather Service Client
  17. 15. Web Applications and Web Services
    1. Web Application Technologies
      1. Page-Oriented Versus “Single Page” Applications
      2. JSPs
      3. XML and XSL
      4. Web Application Frameworks
      5. Google Web Toolkit
      6. HTML5, AJAX, and More...
    2. Java Web Applications
      1. The Servlet Lifecycle
      2. Servlets
      3. The HelloClient Servlet
      4. The Servlet Response
      5. Servlet Parameters
      6. The ShowParameters Servlet
      7. User Session Management
      8. The ShowSession Servlet
      9. The ShoppingCart Servlet
      10. Cookies
      11. The ServletContext API
      12. Asynchronous Servlets
    3. WAR Files and Deployment
      1. Configuration with web.xml and Annotations
      2. URL Pattern Mappings
      3. Deploying HelloClient
      4. Error and Index Pages
      5. Security and Authentication
      6. Protecting Resources with Roles
      7. Secure Data Transport
      8. Authenticating Users
      9. Procedural Authorization
    4. Servlet Filters
      1. A Simple Filter
      2. A Test Servlet
      3. Declaring and Mapping Filters
      4. Filtering the Servlet Request
      5. Filtering the Servlet Response
    5. Building WAR Files with Ant
      1. A Development-Oriented Directory Layout
      2. Deploying and Redeploying WARs with Ant
    6. Implementing Web Services
      1. Defining the Service
      2. Our Echo Service
      3. Using the Service
      4. Data Types
    7. Conclusion
  18. 16. Swing
    1. Components
      1. Peers and Look-and-Feel
      2. The MVC Framework
      3. Painting
      4. Enabling and Disabling Components
      5. Focus, Please
      6. Other Component Methods
      7. Layout Managers
      8. Insets
      9. Z-Ordering (Stacking Components)
      10. The revalidate() and doLayout() Methods
      11. Managing Components
      12. Listening for Components
      13. Windows, Frames and Splash Screens
      14. Other Methods for Controlling Frames
      15. Content Panes
      16. Desktop Integration
    2. Events
      1. Event Receivers and Listener Interfaces
      2. Event Sources
      3. Event Delivery
      4. Event Types
      5. The java.awt.event.InputEvent Class
      6. Mouse and Key Modifiers on InputEvents
      7. Focus Events
    3. Event Summary
      1. Adapter Classes
      2. Dummy Adapters
    4. The AWT Robot!
    5. Multithreading in Swing
  19. 17. Using Swing Components
    1. Buttons and Labels
      1. HTML Text in Buttons and Labels
    2. Checkboxes and Radio Buttons
    3. Lists and Combo Boxes
    4. The Spinner
    5. Borders
    6. Menus
    7. Pop-Up Menus
      1. Component-Managed Pop Ups
    8. The JScrollPane Class
    9. The JSplitPane Class
    10. The JTabbedPane Class
    11. Scrollbars and Sliders
    12. Dialogs
      1. File Selection Dialog
      2. The Color Chooser
  20. 18. More Swing Components
    1. Text Components
      1. The TextEntryBox Application
      2. Formatted Text
      3. Filtering Input
      4. Validating Data
      5. Say the Magic Word
      6. Sharing a Data Model
      7. HTML and RTF for Free
      8. Managing Text Yourself
    2. Focus Navigation
      1. Trees
      2. Nodes and Models
      3. Save a Tree
      4. Tree Events
      5. A Complete Example
    3. Tables
      1. A First Stab: Freeloading
      2. Round Two: Creating a Table Model
      3. Round Three: A Simple Spreadsheet
      4. Sorting and Filtering
      5. Printing JTables
    4. Desktops
    5. Pluggable Look-and-Feel
    6. Creating Custom Components
      1. Generating Events
      2. A Dial Component
      3. Model and View Separation
  21. 19. Layout Managers
    1. FlowLayout
    2. GridLayout
    3. BorderLayout
    4. BoxLayout
    5. CardLayout
    6. GridBagLayout
      1. The GridBagConstraints Class
      2. Grid Coordinates
      3. The fill Constraint
      4. Spanning Rows and Columns
      5. Weighting
      6. Anchoring
      7. Padding and Insets
      8. Relative Positioning
      9. Composite Layouts
    7. Other Layout Managers
    8. Absolute Positioning
  22. 20. Drawing with the 2D API
    1. The Big Picture
    2. The Rendering Pipeline
    3. A Quick Tour of Java 2D
      1. Filling Shapes
      2. Drawing Shape Outlines
      3. Convenience Methods
      4. Drawing Text
      5. Drawing Images
      6. The Whole Iguana
    4. Filling Shapes
      1. Solid Colors
      2. Color Gradients
      3. Textures
      4. Desktop Colors
    5. Stroking Shape Outlines
    6. Using Fonts
      1. Font Metrics
    7. Displaying Images
      1. The Image Class
      2. Image Observers
      3. Scaling and Size
    8. Drawing Techniques
      1. Double Buffering
      2. Limiting Drawing with Clipping
      3. Offscreen Drawing
    9. Printing
  23. 21. Working with Images and Other Media
    1. Loading Images
      1. ImageObserver
      2. MediaTracker
      3. ImageIcon
      4. ImageIO
    2. Producing Image Data
      1. Drawing Animations
      2. BufferedImage Anatomy
      3. Color Models
      4. Creating an Image
      5. Updating a BufferedImage
    3. Filtering Image Data
      1. How ImageProcessor Works
      2. Converting an Image to a BufferedImage
      3. Using the RescaleOp Class
      4. Using the AffineTransformOp Class
    4. Saving Image Data
    5. Simple Audio
    6. Java Media Framework
  24. 22. JavaBeans
    1. What’s a Bean?
      1. What Constitutes a Bean?
    2. The NetBeans IDE
      1. Installing and Running NetBeans
    3. Properties and Customizers
    4. Event Hookups and Adapters
      1. Taming the Juggler
      2. Molecular Motion
    5. Binding Properties
      1. Constraining Properties
    6. Building Beans
      1. The Dial Bean
      2. Design Patterns for Properties
    7. Limitations of Visual Design
    8. Serialization Versus Code Generation
    9. Customizing with BeanInfo
      1. Getting Properties Information
    10. Handcoding with Beans
      1. Bean Instantiation and Type Management
      2. Working with Serialized Beans
      3. Runtime Event Hookups with Reflection
    11. BeanContext and BeanContextServices
    12. The Java Activation Framework
    13. Enterprise JavaBeans and POJO-Based Enterprise Frameworks
  25. 23. Applets
    1. The Politics of Browser-Based Applications
    2. Applet Support and the Java Plug-in
    3. The JApplet Class
      1. Applet Lifecycle
      2. The Applet Security Sandbox
      3. Getting Applet Resources
      4. The <applet> Tag
      5. Attributes
      6. Parameters
      7. ¿Habla Applet?
      8. The Complete <applet> Tag
      9. Loading Class Files
      10. Packages
      11. appletviewer
    4. Java Web Start
    5. Conclusion
  26. 24. XML
    1. The Butler Did It
    2. A Bit of Background
      1. Text Versus Binary
      2. A Universal Parser
      3. The State of XML
      4. The XML APIs
      5. XML and Web Browsers
    3. XML Basics
      1. Attributes
      2. XML Documents
      3. Encoding
      4. Namespaces
      5. Validation
      6. HTML to XHTML
    4. SAX
      1. The SAX API
      2. Building a Model Using SAX
      3. XMLEncoder/Decoder
    5. DOM
      1. The DOM API
      2. Test-Driving DOM
      3. Generating XML with DOM
      4. JDOM
    6. XPath
      1. Nodes
      2. Predicates
      3. Functions
      4. The XPath API
      5. XMLGrep
    7. XInclude
      1. Enabling XInclude
    8. Validating Documents
      1. Using Document Validation
      2. DTDs
      3. XML Schema
      4. The Validation API
    9. JAXB Code Binding and Generation
      1. Annotating Our Model
      2. Generating a Java Model from an XML Schema
      3. Generating an XML Schema from a Java Model
    10. Transforming Documents with XSL/XSLT
      1. XSL Basics
      2. Transforming the Zoo Inventory
      3. XSLTransform
      4. XSL in the Browser
    11. Web Services
    12. The End of the Book
  27. A. The Eclipse IDE
    1. The IDE Wars
    2. Getting Started with Eclipse
      1. Importing the Learning Java Examples
    3. Using Eclipse
      1. Getting at the Source
      2. The Lay of the Land
      3. Running the Examples
      4. Building the Ant-Based Examples
      5. Loner Examples
    4. Eclipse Features
      1. Coding Shortcuts
      2. Autocorrection
      3. Refactoring
      4. Diffing Files
      5. Organizing Imports
      6. Formatting Source Code
    5. Conclusion
  28. B. BeanShell: Java Scripting
    1. Running BeanShell
    2. Java Statements and Expressions
      1. Imports
    3. BeanShell Commands
    4. Scripted Methods and Objects
      1. Scripting Interfaces and Adapters
    5. Changing the Classpath
    6. Learning More . . .
  29. Glossary
  30. Index
  31. About the Authors
  32. Colophon
  33. Copyright
O'Reilly logo

Validating Documents

Words, words, mere words, no matter from the heart.

William Shakespeare, Troilus and Cressida

In this section, we talk about DTDs and XML Schema, two ways to enforce rules in an XML document. A DTD is a simple grammar guide for an XML document, defining which tags may appear where, in what order, with what attributes, etc. XML Schema is the next generation of DTD. With XML Schema, you can describe the data content of the document as well as the structure. XML Schemas are written in terms of primitives, such as numbers, dates, and simple regular expressions, and also allow the user to define complex types in a grammar-like fashion. The word schema means a blueprint or plan for structure, so we’ll refer to DTDs and XML Schema collectively as schema where either applies.

DTDs, although much more limited in capability, are still widely used. This may be partly due to the complexity involved in writing XML Schemas by hand. The W3C XML Schema standard is verbose and cumbersome, which may explain why several alternative syntaxes have sprung up. The javax.xml.validation API performs XML validation in a pluggable way. Out of the box, it supports only W3C XML Schema, but new schema languages can be added in the future. Validating with a DTD is supported as an older feature directly in the SAX parser. We’ll use both in this section.

Using Document Validation

XML’s validation of documents is a key piece of what makes it useful as a data format. Using a schema is somewhat analogous to the way Java classes enforce type checking in the language. A schema defines document types. Documents conforming to a given schema are often referred to as instance documents of the schema.

This type safety provides a layer of protection that eliminates having to write complex error-checking code. However, validation may not be necessary in every environment. For example, when the same tool generates XML and reads it back in a short time span, validation may not be necessary. It is invaluable, though, during development. Sometimes document validation is used during development and turned off in production environments.


The DTD language is fairly simple. A DTD is primarily a set of special tags that define each element in the document and, for complex types, provide a list of the elements it may contain. The DTD <!ELEMENT> tag consists of the name of the tag and either a special keyword for the data type or a parenthesized list of elements.

<!ELEMENT Document ( Head, Body )>

The special identifier #PCDATA (parsed character data) indicates a string. When a list is provided, the elements are expected to appear in that order. The list may contain sublists, and items may be made optional using a vertical bar (|) as an OR operator. Special notation can also be used to indicate how many of each item may appear; two examples of this notation are shown in Table 24-4.

Table 24-4. DTD notation defining occurrences




Zero or more occurrences


Zero or one occurrences

Attributes of an element are defined with the <!ATTLIST> tag. This tag enables the DTD to enforce rules about attributes. It accepts a list of identifiers and a default value:

<!ATTLIST Animal animalClass (unknown | mammal | reptile) "unknown">

This ATTLIST says that the animal element has an animalClass attribute that can have one of several values (e.g.: unknown, mammal, reptile). The default is unknown.

We won’t cover everything you can do with DTDs here. But the following example will guarantee zooinventory.xml follows the format we’ve described. Place the following in a file called zooinventory.dtd (or grab this file from

<!ELEMENT inventory ( animal* )>
<!ELEMENT animal ( name, species, habitat, (food | foodRecipe), temperament, 
    weight )>
<!ATTLIST animal animalClass ( unknown | mammal | reptile | bird | fish ) 
<!ELEMENT name ( #PCDATA )>
<!ELEMENT species ( #PCDATA )>
<!ELEMENT habitat ( #PCDATA )>
<!ELEMENT food ( #PCDATA )>
<!ELEMENT weight ( #PCDATA )>
<!ELEMENT foodRecipe ( name, ingredient+ )>
<!ELEMENT ingredient ( #PCDATA )>
<!ELEMENT temperament ( #PCDATA )>

The DTD says that an inventory consists of any number of animal elements. An animal has a name, species, and habitat tag followed by either a food or foodRecipe. foodRecipe’s structure is further defined later.

To use a DTD, we associate it with the XML document. We can do this by placing a DOCTYPE declaration in the XML document itself and allow the XML parser to recognize and enforce it. The Java validation API that we’ll talk about in the next section separates the roles of parsing and validation and can be used to validate arbitrary XML against any kind of schema, including DTDs. The problem is that out of the box, the validation API only implements the (newer) XML schema syntax. So we’ll have to rely on the parser to validate the DTD for us here.

In this case, when a validating parser encounters the DOCTYPE, it attempts to load the DTD and validate the document. There are several forms the DOCTYPE can have, but the one we’ll use is:

<!DOCTYPE Inventory SYSTEM "zooinventory.dtd">

Both SAX and DOM parsers can automatically validate documents as they read them, provided that the documents contain a DOCTYPE declaration. However, you have to explicitly ask the parser factory to provide a parser that is capable of validation. To do this, just set the validating property of the parser factory to true before you ask it for an instance of the parser. For example:

        SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setValidating( true );

Again, this setValidating() method is an older, more simplistic way to enable validation of documents that contain DTD references and it is tied to the parser. The new validation package that we’ll discuss later is independent of the parser and more flexible. You should not use the parser-validating method in combination with the new validation API unless you want to validate documents twice for some reason.

Try inserting the setValidating() line in our model builder example after the factory is created. Abuse the zooinventory.xml file by adding or removing an element or attribute and then see what happens when you run the example. You should get useful error messages from the parser indicating the problems and parsing should fail. To get more information about the validation, we can register an org.xml.sax.ErrorHandler object with the parser, but by default, Java installs one that simply prints the errors for us.

XML Schema

Although DTDs can define the basic structure of an XML document, they don’t provide a very rich vocabulary for describing the relationships between elements and say very little about their content. For example, there is no reasonable way with DTDs to specify that an element is to contain a numeric type or even to govern the length of string data. The XML Schema standard addresses both the structural and data content of an XML document. It is the next logical step and it (or one of the competing schema languages with similar capabilities) should replace DTDs in the future.

XML Schema brings the equivalent of strong typing to XML by drawing on many predefined primitive element types and allowing users to define new complex types of their own. These schemas even allow for types to be extended and used polymorphically, like types in the Java language. Although we can’t cover XML Schema in any detail, we’ll present the equivalent W3C XML Schema for our zooinventory.xml file here:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="">

<xs:element name="inventory">
       <xs:element maxOccurs="unbounded" ref="animal"/>

<xs:element name="name" type="xs:string"/>

<xs:element name="animal">
      <xs:element ref="name"/>
      <xs:element name="species" type="xs:string"/>
      <xs:element name="habitat" type="xs:string"/>
         <xs:element name="food" type="xs:string"/>
         <xs:element ref="foodRecipe"/>
      <xs:element name="temperament" type="xs:string"/>
      <xs:element name="weight" type="xs:double"/>
    <xs:attribute name="animalClass" default="unknown">
        <xs:restriction base="xs:token">
          <xs:enumeration value="unknown"/>
          <xs:enumeration value="mammal"/>
          <xs:enumeration value="reptile"/>
          <xs:enumeration value="bird"/>

<xs:element name="foodRecipe">
      <xs:element ref="name"/>
      <xs:element maxOccurs="unbounded" name="ingredient" type="xs:string"/>


This schema would normally be placed into an XML Schema Definition file, which has a .xsd extension. The first thing to note is that this schema file is a normal, well-formed XML file that uses elements from the W3C XML Schema namespace. In it, we use nested element declarations to define the elements that will appear in our document. As with most languages, there is more than one way to accomplish this task. Here, we have broken out the “complex” animal and foodRecipe elements into their own separate element declarations and referred to them in their parent elements using the ref attribute. In this case, we did it mainly for readability; it would have been legal to have one big, deeply nested element declaration starting at inventory. However, referring to elements by reference in this way also allows us to reuse the same element declaration in multiple places in the document, if needed. Our name element is a small example of this. Although it didn’t do much for us here, we have broken out the name element and referred to it for both the Animal/Name and the FoodRecipe/Name. Breaking out name like this would allow us to use more advanced features of schema and write rules for what a name can be (e.g., how long, what kind of characters are allowed) in one place and reuse that “type” where needed.

Control directives like sequence and choice allow us to define the structure of the child elements allowed and attributes like minOccurs and maxOccurs let us specify cardinality (how many instances). The sequence directive says that the enclosed elements should appear in the specified order (if they are required). The choice directive allows us to specify alternative child elements like food or foodRecipe. We declared the legal values for our animalClass attribute using a restriction declaration and enumeration tags.

Simple types

Although we’ve not really exercised it here, the type attribute of our elements touches on the standardization of types in XML Schema. All of our “text” elements specify a type xs:string, which is a standard XML Schema string type (kind of equivalent to PCDATA in our DTD). There are many other standard types covering things such as dates, times, periods, numbers, and even URLs. These are called simple types (though some of them are not so simple) because they are standardized or “built-in.” Table 24-5 lists W3C Schema simple types and their corresponding Java types. The correspondence will become useful later when we talk about JAXB and automated binding of XML to Java classes.

Table 24-5. W3C Schema simple types

Schema element type

Java type




"This is text"



true, false, 1, 0























































For example, we have a floating-point weight element like this in our animal:


We can now validate it in our schema by inserting the following entry at the appropriate place:

<xs:element name="weight" type="xs:double"/>

In addition to enforcing that the content of elements matches these simple types, XML Schema can give us much more control over the text and values of elements in our document using simple rules and patterns analogous to regular expressions.

Complex types

In addition to the predefined simple types listed in Table 24-5, we can define our own, complex types in our schema. Complex types are element types that have internal structure and possibly child elements. Our inventory, animal, and foodRecipe elements are all complex types and their content must be declared with the complexType tag in our schema. Complex type definitions can be reused, similar to the way that element definitions can be reused in our schema; that is, we can break out a complex type definition and give it a name. We can then refer to that type by name in the type attributes of other elements. Because all of our complex types were only used once in their corresponding elements, we didn’t give them names. They were considered anonymous type definitions, declared and used in the same spot. For example, we could have separated our animal’s type from its element declaration, like so:

<xs:element name="inventory">
       <xs:element name="animal" maxOccurs="unbounded" 
<xs:complexType name="AnimalType">
    <xs:element ref="name"/>
    <xs:element name="species" type="xs:string"/>
    <xs:element name="habitat" type="xs:string"/>

Declaring the AnimalType separately from the instance of the animal element declaration would allow us to have other, differently named elements with the same structure. For example, our inventory element may hold another element, mainAttraction, which is a type of animal with a different tag name.

There’s a lot more to say about W3C XML Schema and they can get quite a bit more complex than our simple example. However, you can do a lot with the few pieces we’ve previously shown. Some tools are available to help you get started. We’ll talk about one called Trang in a moment. For more information about XML Schema, see the W3C’s site or XML Schema by Eric van der Vlist (O’Reilly). In the next section, we’ll show how to validate a file or DOM model against the XML Schema we’ve just created, using the new validation API.

Generating Schema from XML samples

Many tools can help you write XML Schema. One helpful tool is called Trang. It is part of an alternative schema language project called RELAX NG (which we mention later in this chapter), but Trang is very useful in and of itself. It is an open source tool that can not only convert between DTDs and XML Schema, but also create a rough DTD or XML Schema by reading an “example” XML document. This is a great way to sketch out a basic, starting schema for your documents.

The Validation API

To use our example’s XML schema, we need to exercise the new javax.xml.validation API. As we said earlier, the validation API is an alternative to the simple, parser-based validation supported through the setValidating() method of the parser factories. To use the validation package, we create an instance of a SchemaFactory, specifying the schema language. We can then validate a DOM or stream source against the schema.

The following example, Validate, is in the form of a simple command-line utility that you can use to test out your XML and schemas. Just give it the XML filename and an XML Schema file (.xsd file) as arguments:

    import javax.xml.XMLConstants;
    import javax.xml.validation.*;
    import org.xml.sax.*;
    import javax.xml.transform.sax.SAXSource;
    import javax.xml.transform.Source;
    public class Validate
        public static void main( String [] args ) throws Exception {
            if ( args.length != 2 ) {
                System.err.println("usage: Validate xmlfile.xml xsdfile.xsd");
            String xmlfile = args[0], xsdfile = args[1];
            SchemaFactory factory =
            SchemaFactory.newInstance( XMLConstants.W3C_XML_SCHEMA_NS_URI);
            Schema schema = factory.newSchema( new StreamSource( xsdfile ) );
            Validator validator = schema.newValidator();
            ErrorHandler errHandler = new ErrorHandler() {
                public void error( SAXParseException e ) {
                public void fatalError( SAXParseException e ) {
                public void warning( SAXParseException e ) { 
            validator.setErrorHandler( errHandler );
            try {
                validator.validate( new SAXSource(
                new InputSource("zooinventory.xml") ) );
            } catch ( SAXException e ) {
                // Invalid Document, no error handler

The schema types supported initially are listed as constants in the XMLConstants class. Right now, only W3C XML Schema is implemented and there is also another intriguing type in there that we’ll mention later. Our validation example follows the pattern we’ve seen before, creating a factory, then a Schema instance. The Schema represents the grammar and can create Validator instances that do the work of checking the document structure. Here, we’ve called the validate() method on a SAXSource, which comes from our file, but we could just as well have used a DOMSource to check an in-memory DOM representation:

validator.validate( new DOMSource(document) );

Any errors encountered will cause the validate method to throw a SAXException, but this is just a coarse means of detecting errors. More generally, and as we’ve shown in this example, we’d want to register an ErrorHandler object with the validator. The error handler can be told about many errors in the document and convey more information. When the error handler is present, the exceptions are given to it and not thrown from the validate method.

The errors generated by these parsers can be a bit cryptic. In some cases, the errors may not be able to report line numbers because the validation is not necessarily being done against a stream.

Alternative schema languages

In addition to DTDs and W3C XML Schema, several other popular schema languages are being used today. One interesting alternative that is tantalizingly referenced in the XMLConstants class is called RELAX NG. This schema language offers the most widely used features of XML Schema in a more human-readable format. In fact, it offers both a very compact, non-XML syntax and a regular XML-based syntax. RELAX NG doesn’t offer the same text pattern and value validation that W3C XML Schema does. Instead, these aspects of validation are left to other tools (many people consider this to be “business logic,” more appropriately implemented outside of the schema anyway). If you are interested in exploring other schema languages, be sure to check out RELAX NG and its useful schema conversion utility, Trang.

The best content for your career. Discover unlimited learning on demand for around $1/day.