Internationalization

Use Unicode inside your program. Do any translation to and from other character sets at your interfaces to the outside world. See Chapter 15.

Outside the world of Unicode, you should assume little about character sets and nothing about the ord values of characters. Do not assume that the alphabetic characters have sequential ord values. The lowercase letters may come before or after the uppercase letters; the lowercase and uppercase may be interlaced so that both a and A come before b; the accented and other international characters may be interlaced so that ä comes before b.

If your program is to operate on a POSIX system (a rather large assumption), consult the perllocale manpage for more information about POSIX locales. Locales affect character sets and encodings, and date and time formatting, among other things. Proper use of locales will make your program a little bit more portable, or at least more convenient and native-friendly for non-English users. But be aware that locales and Unicode don't mix well yet.

Get Programming Perl, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.