Unicode Text

Unicode text is text in UTF-16 encoding, as opposed to string , which has the MacRoman encoding. Unicode is the native system-level encoding of Mac OS X, so text supplied by the System is often Unicode text rather than a string. For example:

tell application "Finder" to set x to (get name of disk 1)
class of x -- Unicode text

Similarly, some Mac OS X-native applications, such as TextEdit, return text values as Unicode text. Unicode is capable of expressing tens of thousands of characters, and in its fullest form will express about a million, embracing every character of every written language in history. Eventually we may expect that AppleScript will become completely Unicode-savvy; all AppleScript text will be Unicode text, and the old string type will fade into oblivion.

Unicode text is basically indistinguishable from a string; the differences between them are handled transparently. Whatever you can do to a string, you can do to Unicode text. If you get an element of a Unicode text value, the result is Unicode text. If you concatenate Unicode text and a string, the result is Unicode text (though if you concatenate a string and Unicode text, you get a string; this is troublesome and might change in a future version of AppleScript). You can explicitly coerce between a string and Unicode text, and AppleScript implicitly coerces for you as appropriate.

Nevertheless, Unicode text is currently still a second-class citizen in AppleScript, and can be hard to work with. You ...

Get AppleScript: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.