Leftovers

Here are three terms that appear throughout the XML literature and may stymie the XML beginner:

Attribute

The descriptions of an element that are part of the initial start tag. To reuse a previous example, in <img src="picture.jpg" />, src="picture.jpg"is an attribute for this element. There is some controversy in the XML world about when to use the contents of an element and when to use attributes. The best set of guidelines on this particular issue can be found at http://www.oasis-open.org/cover/elementsAndAttrs.html.

CDATA

The term CDATA (Character Data) is used in two contexts. Most of the time it refers to everything in an XML document that is not markup (tags, etc). The second context involves CDATA sections. A CDATA section is declared to indicate that an XML parser should leave that section of data alone even if it contains text that could be construed as markup.

PCDATA

Tim Bray’s annotation of the XML specification (mentioned earlier) gives the following definition:

The string PCDATA itself stands for “Parsed Character Data.” It is another inheritance from SGML; in this usage, “parsed” means that the XML processor will read this text looking for markup signaled by < and & characters.

You can think of this as data composed of CDATA and potentially some markup. Most XML data falls into this classification.

XML has a bit of a learning curve. This small tutorial should help you get started.

Get Perl for System Administration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.