HTML4 Entity Sets

HTML 4.0 predefines several hundred named entities, many of which are quite useful. For instance, the nonbreaking space is  . XML, however, defines only five named entities:

&

The ampersand (&)

<

The less-than sign (<)

&gt;

The greater-than sign (>)

&quot;

The straight double quote (“)

&apos;

The straight single quote (')

Other needed characters can be inserted with character references in decimal or hexadecimal format. For instance, the nonbreaking space is Unicode character 160 (decimal). Therefore, you can insert it in your document as either &#160; or &#xA0;. If you really want to type it as &nbsp;, you can define this entity reference in your DTD. Doing so requires you to use a character reference:

<!ENTITY nbsp "&#160;">

The XHTML 1.0 specification includes three DTD fragments that define the familiar HTML character references:

Latin-1 characters (http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent)

The non-ASCII, graphic characters included in ISO-8859-1 from code points 160 through 255, shown in Table 27-3

Special characters (http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent)

A few useful letters and punctuation marks not included in Latin-1

Symbols (http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent)

The Greek alphabet, plus various arrows, mathematical operators, and other symbols used in mathematics

Feel free to borrow these entity sets for your own use. They should be included in your document’s DTD with these parameter entity references and PUBLIC identifiers: ...

Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.