12.9. Character Set Encoding

Historically, computers have represented textual data as strings of characters. Each character is a single byte, which allows for 256 different characters. This is more than enough for English speakers and was adapted for people speaking most European languages. Asian languages, however, do not fit neatly into 256 characters. To cope with a larger range of characters, we have multibyte encoding. Instead of a single byte, these encodings use multiple bytes to represent one visual character.

PHP scripts are written in standard, single-byte ASCII, but it's possible to embed strings of multibyte text in a script. Unfortunately, PHP's text manipulation functions assume single-byte encoding. A string encoded to use two ...

Get Core PHP Programming, Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.