Character Encoding in Brief

The last rule for applying font-family values can create some confusion, as HTTP response headers aren’t always under the control of stylists.

Every properly configured web server that runs an adequate implementation of HTTP—which is to say, nearly all of them—specifies the language and character set of each document that it sends to client hosts. Additional interfaces such as the meta element and the PHP Header() function allow developers to alter or override those assignments on a case-by-case basis.

What Is Character Encoding?

Hopefully you’re familiar with the concept of bits and bytes; a bit ultimately represents the state of a single circuit in system RAM, and a byte is equal to eight of those in a logical row, which can arranged in one of 256 ways.

The technicians of the English-speaking world have grown accustomed to the representation of a single Latin character—or glyph, in typography jargon—within a single byte. That example has been followed for other alphabets as well.

Consider the example of Morse Code: its character representations are composed of variable-length series of dits (analogous to unset bits) and dahs (analogous to set bits). In this case, the assignment of a character’s unique sequence of dits and dahs is informed by its typical frequency in telegraphic messages.

In the guts of a computer, however, a more systematic means of assignment can be afforded. The definition of “systematic” encompasses the following questions: ...

Get HTML & CSS: The Good Parts now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.