Whitespace and RELAX NG Native Datatypes
RELAX NG includes a
native type system, but this type
library has been kept minimal by design because more complete type
libraries are available. It consists of just two datatypes
(token
and string
) that differ
only in the whitespace processing applied before validation. The
whole RELAX NG datatype system can be seen as a mechanism for adding
validating transformations to text nodes. These transformations
change text nodes into
canonical
formats (formats in which all the
different formats for a same value are converted into a single
normalized or “canonical” format).
The two native datatypes don’t detect format errors
(their formats are broad enough to allow any value) but still
transform text nodes in their canonical forms, which can make a
difference for enumerations. Other datatype libraries, covered in
Chapter 8, can detect format errors.
Enumerations are the first place you can see datatypes at work.
Applying datatypes to enumeration values is done by adding a
type
attribute in value
patterns. Up to now, we haven’t specified any
datatype when we’ve written value
elements. By default, they have the default type
token
from the built-in library. Text values of
this datatype receive full whitespace normalization similar to that
performed by the
XPath normalize-space( )
function: all sequences of one or more whitespace
characters—the characters #x20
(space),
#x9
(tab), #xA
(linefeed), and
#xD
(carriage return)—are replaced by a single ...
Get RELAX NG now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.