Handling Special Characters and Encoding

A simple topic, but one that causes problems to many new programmers, is character encoding and parsing. The basic problem is embedded characters into the XML file not supported by the specified encoding standard. This situation will cause XML parsers to fail. This happens more than it should because programmers will default to using the UTF-8 character encoding scheme. Although using UTF-8 is a good choice, it won't always be the correct default encoding for an XML document.

One problem with UTF-8 is Latin1 incompatibility. What this means in practice is that ASCII symbols in the range 160-255 can cause problems. Let's create an example. The file, shown in Listing 9.8, should be saved as webapps/xmlbook/chapter9/SpecialChars.xml ...

Get JSP™ and XML Integrating XML and Web Services in Your JSP™ Application now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.