Name
Characters
Synopsis
XML documents are inherently text documents, which are composed of characters. To ensure that documents are portable across disparate computer systems and can contain content in as many written human languages as possible, XML parsers are required to implement the Unicode standard. This does not mean that all XML documents must be saved and edited in Unicode, but it does mean that the XML parser must be able to convert your document from its native character encoding to Unicode. All XML parsers are required to support (as a minimum) either UTF-8 or UTF-16 as input encoding formats. For more information on encoding formats and Unicode, see Chapter 27.
Tip
One of the primary differences between XML 1.0 and XML
1.1 is the definition of which Unicode characters are valid
within an XML document. In XML 1.0, many of the ASCII control
characters (such as BEL and NAK) were explicitly disallowed
within XML documents. XML 1.1 permits any Unicode character
these 60 control characters (except for null, x0000
) as long as they’re escaped
with numeric character references. XML 1.1 also requires that
the C1 controls between 0x0080
and 0x009F
be escaped with numeric
character references, which XML 1.0 does not require.
Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.