Look at XML Documents Through the Lens of the XML Information Set

If you get a grip on the XML Information Set, you’ll know you don’t have to worry about it too much.

The XML Information Set or Infoset (http://www.w3.org/TR/xml-infoset) is a recommendation from the W3C that describes an abstract data set whose definitions can be used to describe well-formed XML documents (documents don’t have to be valid). These definitions are set forth so that other W3C specs can use the same terminology and not trip over each other’s shoelaces.

An infoset is supposed to describe the result of parsing an XML document; it can also be constructed by other means, such as in a Document Object Model (DOM) tree (http://www.w3.org/TR/xml-infoset/#intro.synthetic). Normally, you don’t hear folks talk about structures in XML documents using the terms defined in this spec.

The infoset consists of a set of 11 information items, each with a set of properties. The following list briefly outlines these information items and their associated properties:

Document information item

Properties: all declarations processed, base URI, character encoding scheme, children, document element, notations, standalone, unparsed entities, version

Element information item

Properties: attributes, base URI, children, in-scope namespaces, local name, namespace attributes, namespace name, parent, prefix

Attribute information item

Properties: attribute type, local name, namespace name, normalized value, owner element, prefix, references, ...

Get XML Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.