Serialization Details

The Infoset deals exclusively in the Unicode character set; all strings and characters map onto Unicode code points, which ultimately are just well-known numeric values. For flexibility, XML 1.0's serialization format supports a broad range of non-Unicode encodings. In some scenarios, the encoding scheme in use can be communicated via out-of-band techniques (e.g., a surrounding MIME header). XML 1.0 also supports a more self-contained technique that does not rely on external information. Each parsed entity may begin with a declaration indicating which encoding scheme is in use.

<?xml version='1.0' encoding='UTF-8' ?>

This declaration is optional if the parsed entity is encoded as UTF-8 or UTF-16. The declaration is mandatory ...

Get Essential XML: Beyond Markup now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.