ICU Internationalization Extension

SQLite provides full support for Unicode text values. Unicode provides a way to encode many different character representations, allowing a string of bytes to represent written characters, glyphs, and accents from a multitude of languages and writing systems. What Unicode does not provide is any information or understanding of the sorting rules, capitalization rules, or equivalence rules and customs of a given language or location.

This is a problem for pattern matching, sorting, or anything that depends on comparing text values. For example, most text-sorting systems will ignore case differences between words. Some languages will also ignore certain accent marks, but often those rules depend on the specific accent mark and character. Occasionally, the rules and conventions used within a language change from location to location. By default, the only character system SQLite understands is 7-bit ASCII. Any character encoding of 128 or above will be treated as a binary value with no awareness of capitalization or equivalence conventions. While this is often sufficient for English, it is usually insufficient for other languages.

For more complete internationalization support, you’ll need to build SQLite with the ICU extension enabled. The International Components for Unicode project is an open-source library that implements a vast number of language-related functions. These functions are customized for different locales. The SQLite ICU extension allows ...

Get Using SQLite now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.