Character References

In direct contrast to SGML and HTML (which were designed to handle ASCII-based languages such as English and the European and Scandinavian languages), XML is based on the Unicode and ISO/IEC 10646 standard, and is geared to support languages such as Hindi, Arabic, and even Chinese.

However, a problem arises if you want to use such languages in your XML document and you do not have a keyboard that supports these characters. To handle this, you can use character references. A character reference consists of a string starting with &#, followed by the number of the character in the ISO/IEC 10646 character set. This string is terminated by a semicolon (;). For example, to represent the copyright (©) symbol, you will need to use ...

Get Java™ APIs for XML Kick Start now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.