Appendix A. Character Entities

Characters not found in the normal alphanumeric character set, such as < and &, may be specified in HTML and XHTML documents using character references. This process is known as escaping the character. Escaped characters are indicated by character references that begin with & and end with ;. The character may be referred to by its Numeric Character Reference (NCR) or a predefined character entity name.

A Numeric Character Reference refers to a character by its Unicode code point in either decimal or hexadecimal form. Hexadecimal values are indicated by an “x”: &#xhhhh;. Decimal character references use the syntax &#nnnn; (no “x” character). For example, the em-dash (—) character has the Unicode code point U+02014, which can be identified as &#x2014 (hexadecimal) or &#8212; (decimal) in an HTML document.

Character entities (or Named Character References) are abbreviated names for characters, such as &lt; for the less-than symbol. Character entities are predefined in markup languages such as HTML and XHTML as a convenience to authors because they may be easier to remember than Numeric Character References. HTML 4.01 defined 252 character entities. That number has grown to more than 2,000 in HTML5.

The remainder of this section lists only the most commonly used character references and entities. For additional character references, see the following resources:

W3C Character Entity Reference Chart

This visual chart organizes 488 characters in numerical order ...

Get HTML5 Pocket Reference, 5th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.