Item 38. Write in Unicode

You may work in English, but these days it's no great surprise if some of your coworkers or customers are more comfortable in French, Chinese, or Amharic. One of the most underrated advantages of XML is its internationalization support. Much of this is a direct result of its dependence on Unicode. In effect, every XML document is read in Unicode. Even if the document is written in a different character set such as ISO-8859-1 or SJIS, the parser converts it to Unicode on input. Thus it behooves you to know how to properly process Unicode data.

How difficult this is varies greatly from one language or environment to the next. In Python 2.2 it's relatively easy. In Java it's not too hard, but there are some pitfalls laid ...

Get Effective XML: 50 Specific Ways to Improve Your XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.