Techniques and Data Structures for Handling Unicode Text

Chapter 13. Techniques and Data Structures for Handling Unicode Text

We've come a long way. We've looked at the general characteristics of Unicode, the Unicode encoding forms, combining character sequences and normalization forms, and the various properties in the Unicode Character Database. We've taken a long tour through the Unicode character repertoire, looking at the unique requirements of the various writing systems, including such things as bidirectional character reordering, Indic consonant clusters, diacritical stacking, and the interesting challenges associated with representing the Han characters. Now we've finally gotten to the point where we can talk about actually doing things with Unicode text.

In Part III, we'll delve into the ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Unicode Demystified by Richard Gillam

Chapter 13. Techniques and Data Structures for Handling Unicode Text

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly