CDATA Sections

The golden rule of handling CDATA sections is this: ignore them. When writing code to process XML, pretend CDATA sections do not exist, and everything will work just fine. The content of a CDATA section is plain text. It will be reported to your application as plain text, just like any other text, whether enclosed in a CDATA section, escaped with character references, or typed out literally when escaping is not necessary. For example, these two example elements are exactly the same as far as anything in your code should know or care:

<example><![CDATA[<?xml version="1.0"?>
<root>
  Hello!
</root>]]></example>
<example>&lt;?xml version="1.0"?>
&lt;root>
  Hello!
&lt;/root></example>

Do not write programs or XML documents that depend on knowing the difference between the two. Parsers rarely (and never reliably) inform you of the difference. Furthermore, passing such documents through a processing chain often removes the CDATA sections completely, leaving only the content intact but represented differently—for instance, with numeric character references representing the unserializable characters. CDATA sections are a minor convenience for human authors, nothing more. Do not treat them as markup.

This also means you should not attempt to nest one XML (or HTML) document inside another using CDATA sections. XML documents are not designed to nest inside one another. The correct solution to this problem is to use namespaces to sort out which markup is which, rather than trying ...

Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.