Unicode code points

Unicode was originally intended to be a 16-bit encoded character set, but it was soon recognized that 65,536 code positions would not be enough, so it was extended to include more than a million available code points (not all of them are assigned, of course) on supplementary planes.

The first 16 bits, or 65,536 positions, in Unicode are referred to as the Basic Multilingual Plane (BMP) . The BMP includes most of the more common characters in use, such as character sets for Latin, Greek, Cyrillic, Devangari, hirgana, katakana, Cherokee, and others, as well as mathematical and other miscellaneous characters. Most ideographs are there, too, but due to their large numbers, many have been moved to a Supplementary Ideographic Plane.

Unicode was created with backward compatibility in mind. The first 256 code points in the BMP are identical to the Latin-1 character set, with the first 128 matching the established ASCII standard.

Get Web Design in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.