Wide and Multibyte Characters

The familiar char type is sometimes called a narrow character , as opposed to wchar_t, which is a wide character . The key difference between a narrow and wide character is that a wide character can represent any single character in any character set that an implementation supports. A narrow character, on the other hand, might be too small to represent all characters, so multiple narrow char objects can make up a single, logical character called a multibyte character .

Beyond some minimal requirements for the character sets (see Chapter 1), the C++ standard is purposely open-ended and imposes few restrictions on an implementation. Some basic behavioral requirements are that conversion from a narrow character to a wide character must produce an equivalent character. Converting back to a narrow character must restore the original character. The open nature of the standard gives the compiler and library vendor wide latitude. For example, a compiler for Japanese customers might support a variety of Japanese Industrial Standard ( JIS) character sets, but not any European character sets. Another vendor might support multiple ISO 8859 character sets for Western and Eastern Europe, but not any Asian multibyte character sets. Although the standard defines universal characters in terms of the Unicode (ISO/IEC 10646) standard, it does not require any support for Unicode character sets.

This section discusses some of the broad issues in dealing with wide and multibyte ...

Get C++ In a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.