Chapter 22. XML

I am a little world made cunningly. Of elements, and an angelic sprite

John Donne, Holy Sonnets

Introduction

The Extensible Markup Language (XML) standard was released in 1998. It quickly became the standard way to represent and exchange almost every kind of data, from books to genes to function calls.

XML succeeded where other past “standard” data formats failed (including XML’s ancestor, SGML—the Standard Generalized Markup Language). There are three reasons for XML’s success: it is text-based instead of binary, it is simple rather than complex, and it has a superficial resemblance to HTML.

Text

Unix realized nearly 30 years before XML that humans primarily interact with computers through text. Thus text files are the only files any system is guaranteed to be able to read and write. Because XML is text, programmers can easily make legacy systems emit XML reports.

Simplicity

As we’ll see, a lot of complexity has arisen around XML, but the XML standard itself is very simple. There are very few things that can appear in an XML document, but from those basic building blocks you can build extremely complex systems.

HTML

XML is not HTML, but XML and HTML share a common ancestor: SGML. The superficial resemblance meant that the millions of programmers who had to learn HTML to put data on the web were able to learn (and accept) XML more easily.

Syntax

Example 22-1 shows a simple XML document.

Example 22-1. Simple XML document
<?xml version="1.0" encoding="UTF-8"?> <books> <!-- ...

Get Perl Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.