Character Entities

Characters not found in the normal alphanumeric character set, such as < and &, must be specified in HTML and XHTML documents using character references. This process is known as escaping the character. In (X)HTML documents, escaped characters are indicated by character references that begin with & and end with ;. The character may be referred to by its Numeric Character Reference (NCR) or a predefined character entity name.

A Numeric Character Reference refers to a character by its Unicode code point in either decimal or hexadecimal form. Decimal character references use the syntax &#nnnn;. Hexadecimal values are indicated by an “x”: &#xhhhh;. For example, the less-than (<) character could be identified as &#60; (decimal) or &#x3C (hexadecimal).

Character entities are abbreviated names for characters, such as &lt; for the less-than symbol. Character entities are predefined in the DTDs of markup languages such as HTML and XHMTL as a convenience to authors because they may be easier to remember than Numeric Character References.

ASCII Character Set

HTML and XHTML documents use the standard 7-bit ASCII character set in their source. The first 31 characters in ASCII (not listed) are such device controls as backspace (&#08;) and carriage return (&#13;) and are not appropriate for use in HTML documents.

HTML 4.01 defines only four entities in this character range—less than (<,&lt;), greater than (<, &gt;), ampersand (&, &amp;), and quotation mark (", &quot;)—that are necessary ...

Get HTML & XHTML Pocket Reference, 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.