Chapter 20. Processing XML

Introduction

The Extensible Markup Language, or XML, is a portable, human-readable format for exchanging text or data between programs. XML is derived from the parent standard SGML, as is the HTML language used on web pages worldwide. XML, then, is HTML’s younger but more capable sibling. And because most developers know at least a bit of HTML, parts of this discussion compare XML with HTML. XML’s lesser-known grandparent is IBM’s GML (General Markup Language), and one of its cousins is Adobe FrameMaker’s Maker Interchange Format (MIF). Figure 20-1 depicts the family tree.

XML’s ancestry
Figure 20-1. XML’s ancestry

One way of thinking about XML is that it’s like HTML cleaned up, consolidated, and—most importantly—with the ability for you to define your own tags. It’s HTML with tags that can and should identify the informational content as opposed to the formatting. Another way of perceiving XML is as a general interchange format for such things as business-to-business communications over the Internet or as a human-editable[57] description of things as diverse as word-processing files and Java documents. XML is all these things, depending on where you’re coming from as a developer and where you want to go today—and tomorrow.

Because it is text, XML can be generated from Java in a number of ways. For very simple cases, you can just use good old out.println(), but this is not ...

Get Java Cookbook, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.