O'Reilly logo

Python 2.6 Text Processing Beginner's Guide by Jeff McNeil

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Unicode

We've only looked at two legacy encodings here and it's already apparent we have a problem. Consider all of the additional scripts and variations that exist. We've not touched any of the Japanese Kanji or variations on both the Latin and the Cyrillic alphabets. It's just not possible to fit all of the world's characters into a single byte.

The Unicode specification, as it currently stands, allows for over one million different code points (1,112,064 to be exact). That's more than enough space to hold all of the world's current scripts as well as historic characters. Currently, only about 20 percent of the Unicode space has been assigned.

Let's take a brief overview of Unicode in order to provide a solid understanding of its strengths.

Using ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required