Chapter 4. Unmarshalling

In this chapter, we move from creating Java source files to creating Java objects. In Chapter 3, you built a framework of objects (compiled source files) that represented your constraints. However, this framework isn’t particularly useful on its own. Just as a DTD isn’t of much use without XML, generated classes aren’t any good without instance data. We take the next logical step in this chapter and work on taking an XML document and generating instance data.

I start out by walking you through the process flow for unmarshalling, which is the technical term for converting an XML document into Java object instances. This will give you the same background as the class generation process flow section did and prepare you to work through the rest of the chapter. From there on, it’s all working code. First, I discuss creating instance documents, XML documents that conform to your constraint set. Once you’ve got your data represented in that format, you’re ready to convert the XML into Java; the result is instances of the classes you generated in the last chapter. Finally, I cover how to take this data, in Java format, and use it within your application. You’ll want to have your XML editor and Java IDE fired up because there is a lot of code in this chapter; let’s get to it.

Process Flow

As in the case of class generation, I want to spend a little time walking through the process flow of unmarshalling XML data into Java objects. This is useful in understanding exactly what happens when you invoke that unmarshal( ) method (or whatever it’s called with your framework). Rather than relying on a black box process, you’ll be able to know exactly what goes on, troubleshoot oddities in your applications, and maybe even help out the framework programmers with a bug here and there.

  1. Construct XML data to unmarshal into Java objects.

  2. Convert the XML data into instances of generated Java objects.

  3. Use the resultant Java object instances.

Each step is detailed here.

XML Data

First, you need to have some XML data to start with. This probably isn’t any great revelation to you, but it’s worth taking a look at. You’ll need an XML document that matches up with the constraints designed in the class generation process. Additionally, this document must be valid with respect to those constraints. Valid means that the structure and data in the document fulfill the data contract set out by your DTD. I talk in detail about how to validate your documents both before and during data binding later on in this chapter.

There’s not a lot of complexity in this step, so I won’t dwell on it. There are certainly some subtle issues to work through in ensuring that the data in your XML document correctly maps to where it belongs in your Java classes, and I cover that in the more detailed sections of the chapter. For now, though, as long as you’ve got an XML document and have a set of generated classes from the document’s DTD, you’re ready to roll.

Java Conversion

The guts of the unmarshalling process is the conversion from XML to Java. This is where the most interesting action takes place in any framework. However, it’s also the place where the process itself varies the most between frameworks. While the starting point (an XML document) and ending point (Java object instances) are the same, the “in-between” is not. Still, basic principles that are important to understand are at work, and these basics apply to all frameworks.

First, you’ll need to convert your XML data into some form of an input stream (usually an InputStream or Reader in Java parlance). This may seem too simple to be worth mentioning, but it turns out to be an important point. It’s a common misconception to think about data binding as a process that takes an XML file and converts it to Java instance data. However, it’s just as likely that the XML data come from a network stream, email message, or some other medium entirely, as opposed to a static file on a hard drive. This opens up all sorts of possibilities and also allows you to think a bit outside of the box. Consider taking a SOAP message, the response to a questionnaire, or an XML shipping manifest, all from a third party. Instead of having to write SAX or DOM code to deal with this information, data binding allows a simple means of interacting with this business data in a business way—a very handy option to have available.

The actual object that the unmarshal( ) method is invoked on is where variance begins to creep in. For example, using JAXB, generated classes are all concrete; to unmarshal an object, you will have code like this:

// Get the input stream for the XML
InputStream inputStream = getXMLInputStream(  );
  
// Unmarshal into an object
Movies moviesObject = Movies.unmarshal(inputStream);
  
// Operate on the instance data

This code would seem to create a problem, though, since Zeus creates interfaces. Because unmarshal( ) must be a static method (you don’t have instance data yet, so you can’t work on an instance), it must exist only on the implementation. To get around this issue, Zeus generates an additional class, called [top-level-object]Unmarshaller. Since movies is the top-level object in the movie database XML, this would be MoviesUnmarshaller. Invoke the unmarshal( ) method on this object like this:

// Get the input stream for the XML
InputStream inputStream = getXMLInputStream(  );
  
// Unmarshal into an object
Movies movieObject = MoviesUnmarshaller.unmarshal(inputStream);
  
// Operate on instance data

You’ll see similar variances in other frameworks. In all cases, you should get a Java Object back from this method, which is the top-level Java object instance. Depending on the framework, you may have to cast this object to the expected type, as shown here:

// Get the input stream for the XML
InputStream inputStream = getXMLInputStream(  );
  
// Unmarshal into an object
Movies movieObject = (Movies)Unmarshaller.unmarshal(inputStream);
  
// Operate on instance data

Still, while these approaches may vary, the basic result is the same: a Java object instance that you can then use to access the XML data without having to work in XML.

Result Objects

Once you’ve performed unmarshalling, you’re left with a set of result object instances. The returned value from the unmarshalling process, as I already mentioned, is the top-level instance of the unmarshalled XML document. This is going to be an instance of the object that corresponds with the root element of your XML document. It’s going to have any references to member objects, as well. Thus, for the movies database shown in the last chapter (Example 3-2), you would end up with an object tree like that shown in Figure 4-1.

Object instance tree for movie database

Figure 4-1. Object instance tree for movie database

Other than understanding this structure, there’s not much else to these result objects. In fact, that’s what is worth emphasizing here: these result objects are normal, ordinary Java object instances. There aren’t any special instructions to use them, gotchas to worry about, or other pitfalls.

Use these objects as you would any others, and don’t worry about them being data bound. And with that (lack of) admonition, you’ve got a handle on the unmarshalling process flow. Figure 4-2 illustrates the entire process.

Unmarshalling process flow

Figure 4-2. Unmarshalling process flow

Get Java & XML Data Binding now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.