XML Structures

Everything in an XML document is text—typically Unicode text. Special characters (primarily < and >, but also &, '', and ') are used to separate the text that identifies document structures from the text contained in those structures. The text that represents the structure of the document is called markup, as historically it was extra information added to text documents to provide metadata, formatting, or other information. Adding this information to a document is referred to as “marking up” the document, although text and markup are usually created simultaneously now.

As each structure is discussed, applicable productions from the XML 1.0 and 1.1 specs will be listed in the order in which they appear in the specs. However, productions for Letter, BaseChar, IdeoGraphic, CombiningChar, Digit, and Extender are omitted here for the sake of brevity (refer to Appendix B in the 1.0 spec, at http://www.w3.org/TR/REC-xml/#CharClasses). If there are differences between the 1.0 and 1.1 productions, the line representing the production will be appended by either 1.0 or 1.1; otherwise, the productions in both specs are the same. Productions may be repeated for the reader’s convenience.

You will find references to the XML specification in this section. Any reference preceded by a section symbol (§) is a reference to the XML spec. For example, §2.1 refers to Section 2.1 of the XML 1.0 and 1.1 specifications.

Get XML Pocket Reference, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.