Handling Character Sets Safely

Although English is currently the most pervasive language throughout Web sites on the Internet, other languages such as Chinese (Mandarin), Spanish, Japanese, and French hold a significant share. (I would cite a specific reference for this list of languages, but the Internet being what it is, the list could easily be surpassed by lolcat, l33t, or Klingon by the time you read this – none of which invalidates the problem of character encoding.) Consequently, Web browsers must be able to support non-English writing systems whether the system merely includes accented characters, ligatures, or complex ideograms. One of the most common encoding schemes used on the Web is the UTF-8 standard.

Character encoding is a ...

Get Seven Deadliest Web Application Attacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.