8. Unicode

The world used to be so simple. Everything fit into 7 bits, and you didn’t have to worry about special characters or character sets. Back then, strings were sequences of bytes, and each byte represented its own character. People got used to the idea that bytes and characters were the same thing, and everyone formed really bad habits that still infect programming. We’re going to call bytes octets instead.

Now you know that characters and bytes aren’t the same thing. To get all the fancy characters you need, or even the pieces you need to build new characters, you use the Unicode Character Set (sometimes called just UCS). This book is much too small to go through all of the details of Unicode, but you need to know at least a little ...

Get Effective Perl Programming: Ways to Write Better, More Idiomatic Perl, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.