UTF-7

You might run across some other, less common, Unicode transformation formats. The most common of these formats is UTF-7, which was designed for use in 7-bit ASCII environments that can't handle 8-bit characters. In particular, the original version of the Simple Mail Transfer Protocol (SMTP), which is still in common use, wouldn't work with 8-bit character values and required this approach.

UTF-7 works by using a scheme similar to the numeric-character reference scheme employed in HTML and XML. Most of the ASCII characters are just themselves, but a few are used to signal sequences of characters that specify otherwise unrepresentable Unicode values. Thus, unlike the other Unicode transformation formats, UTF-7 is a stateful encoding, with ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.