Appendix E. Common Content Encodings
In an ideal world, the only character encoding (or, loosely,
“character set”) that you’d ever see would be UTF-8 (utf-8
), and Latin-1 (iso-8859-1
) for all those legacy documents.
However, the encodings mentioned below exist and can be found on the Web.
They are listed below in order of their English names, with the lefthand
side being the value you’d get returned from $response->content_charset
. The complete list
of character sets can be found at http://www.iana.org/assignments/character-sets.
Value | Encoding |
us-ascii | ASCII plain (just characters 0x00-0x7F) |
asmo-708 | Arabic ASMO-708 |
iso-8859-6 | Arabic ISO |
dos-720 | Arabic MSDOS |
windows-1256 | Arabic MSWindows |
iso-8859-4 | Baltic ISO |
windows-1257 | Baltic MSWindows |
iso-8859-2 | Central European ISO |
ibm852 | Central European MSDOS |
windows-1250 | Central European MSWindows |
hz-gb-2312 | Chinese Simplified (HZ) |
gb2312 | Chinese Simplified (GB2312) |
euc-cn | Chinese Simplified EUC |
big5 | Chinese Traditional (Big5) |
cp866 | Cyrillic DOS |
iso-8859-5 | Cyrillic ISO |
koi8-r | Cyrillic KOI8-R |
koi8-u | Cyrillic KOI8-U |
windows-1251 | Cyrillic MSWindows |
iso-8859-7 | Greek ISO |
windows-1253 | Greek MSWindows |
iso-8859-8-i | Hebrew ISO Logical |
iso-8859-8 | Hebrew ISO Visual |
dos-862 | Hebrew MSDOS |
windows-1255 | Hebrew MSWindows |
euc-jp | Japanese EUC-JP |
iso-2022-jp | Japanese JIS |
shift_jis | Japanese Shift-JIS |
iso-2022-kr | Korean ISO |
euc-kr | Korean Standard |
windows-874 | Thai MSWindows |
iso-8859-9 | Turkish ISO |
windows-1254 | Turkish MSWindows |
utf-8 | Unicode expressed as UTF-8 |
utf-16 | Unicode expressed as UTF-16 |
windows-1258 | Vietnamese MSWindows |
viscii ... |
Get Perl & LWP now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.