The Encoding Declaration

Every XML document should have an encoding declaration as part of its XML declaration. The encoding declaration tells the parser in which character set the document is written. It’s used only when other metadata from outside the file is not available. For example, this XML declaration says that the document uses the character encoding US-ASCII:

<?xml version="1.0" encoding="US-ASCII" standalone="yes"?>

This one states that the document uses the Latin-1 character set, although it uses the more official name ISO-8859-1:

<?xml version="1.0" encoding="ISO-8859-1"?>

Even if metadata is not available, the encoding declaration can be omitted if the document is written in either the UTF-8 or UTF-16 encodings of Unicode. UTF-8 is a strict superset of ASCII, so ASCII files can be legal XML documents without an encoding declaration. Note, however, that this only applies to genuine, pure 7-bit ASCII files. It does not include the extended ASCII character sets that some editors produce with characters like ©, ç, or “.

Even if character set metadata is available, many parsers ignore it. Thus, we highly recommend including an encoding declaration in all your XML documents that are not written in UTF-8 or UTF-16. It certainly never hurts to do so.

Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.