CDATA Sections
The golden rule of handling CDATA
sections is
this: ignore them. When writing code to process XML, pretend
CDATA
sections do not exist, and
everything will work just fine. The content of a CDATA
section is plain text. It will be
reported to your application as plain text, just like any other
text, whether enclosed in a CDATA
section, escaped with character references, or typed out literally
when escaping is not necessary. For example, these two example
elements are exactly the same as
far as anything in your code should know or care:
<example><![CDATA[<?xml version="1.0"?> <root> Hello! </root>]]></example> <example><?xml version="1.0"?> <root> Hello! </root></example>
Do not write programs or XML documents that depend on knowing
the difference between the two. Parsers rarely (and never reliably)
inform you of the difference. Furthermore, passing such documents
through a processing chain often removes the CDATA
sections completely, leaving only
the content intact but represented differently—for instance, with
numeric character references representing the unserializable
characters. CDATA
sections are a
minor convenience for human authors, nothing more. Do not treat them
as markup.
This also means you should not attempt to nest one XML (or
HTML) document inside another using CDATA
sections. XML documents are not designed to nest inside one another. The correct solution to this problem is to use namespaces to sort out which markup is which, rather than trying ...
Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.