Text and Empty Patterns, Whitespace, and Mixed Content

So far, we have used text patterns only within group patterns. It’s important to remember, however, that this pattern doesn’t mean simply a text node but rather zero or more text nodes. This statement deserves some exploration.

The reason why text patterns accept zero text nodes is linked to the policy adopted by RELAX NG regarding whitespace. Whitespace processing rules are one of the fuzzier areas in XML. RELAX NG has attempted to find the “least surprising” policy that supports the most common usages. You’ll see more whitespace processing when we study datatypes, but for now, let’s say that RELAX NG doesn’t see any distinction between empty strings; no string at all; strings containing only whitespace before or after an element node; and to a lesser extent, a single text child element containing only whitespace.

For instance, in the following snippet:

<foo at1="" at2=" ">
 <bar/>
 <bar></bar>
 <bar>
  <baz/>
  <baz/>
 </bar>
 <bar>
 </bar>
</foo>

RELAX NG treats as insignificant the values of at1 and at2, the content of the first and second bar elements, the text between the third bar start tag and the first baz element, the text between the two baz elements, and even the text within the last bar element. RELAX NG’s rules state that the content should match either text or empty patterns. Here are two visible consequences for the patterns we’ve seen so far:

  • Because text patterns match any text node, they must match strings that are either ...

Get RELAX NG now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.