Mixed Content

In narrative documents, it’s common for a single element to contain both child elements and un-marked up, nonwhitespace character data. For example, recall this definition element from Chapter 2:

<definition>A <term>Turing Machine</term> refers to an abstract finite 
state automaton with infinite memory that can be proven equivalent 
to any any other finite state automaton with arbitrarily large memory. 
Thus what is true for one Turing machine is true for all Turing 
machines no matter how implemented.
</definition>

The definition element contains some nonwhitespace text and a term child. This is called mixed content . An element that contains mixed content is declared like this:

<!ELEMENT definition (#PCDATA | term)*>

This says that a definition element may contain parsed character data and term children. It does not specify in which order they appear, nor how many instances of each appear. This declaration allows a definition to have 1 term child, 0 term children, or 23 term children.

You can add any number of other child elements to the list of mixed content, although #PCDATA must always be the first child in the list. For example, this declaration says that a paragraph element may contain any number of name, profession, footnote, emphasize, and date elements in any order, interspersed with parsed character data:

<!ELEMENT paragraph
  (#PCDATA | name | profession | footnote | emphasize | date )*
>

This is the only way to indicate that an element contains mixed content. You ...

Get XML in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.