Unicode Text

Like the Macintosh itself, the AppleScript string class has long been bedeviled by the existence of text encodings representing characters outside its own native encoding, which is MacRoman . With the coming of Mac OS X, this problem is essentially solved at system level: text is now Unicode . Unicode expresses tens of thousands of characters in a single massive encoding, and in its fullest form will express about a million characters, embracing every character of every written language in history. Unfortunately, AppleScript precedes Mac OS X, and the string class is still its primary text class. Over the years, various secondary classes have been fudged into AppleScript in an attempt to increase a string's representational power and to improve AppleScript's compatibility with text in the world around it. At the moment, the most important of these is the Unicode text class, which has the UTF-16 encoding.

Text supplied by the system is often Unicode text rather than a string. For example:

tell application "Finder" to set x to (get name of disk 1)
class of x -- Unicode text

Similarly, some Mac OS X-native applications, such as TextEdit, return text values as Unicode text.

The trouble is that Unicode text remains very much a second-class citizen within AppleScript. Perhaps someday all AppleScript text will be Unicode text, but that day has not yet come. A literal string (the stuff between quotes in your code) is still a string, not Unicode text. Thus, you can't even enter ...

Get AppleScript: The Definitive Guide, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.