Chapter 8. Processing Text

So far in this book you've mainly been matching nodes in XML source documents to rules in stylesheets, but there is much to learn about what can be done with the text nodes that make up much of an element's content, and about processing raw text, too.

XML parsers and XSLT processors deal with whitespace-only text nodes in particular ways, and there are XSLT declarations that you can use to control how whitespace is handled inside elements.

Although XSLT is primarily designed to generate XML markup, you will find that you can use XSLT to produce plain text without markup in any convenient output format.

You may also be surprised to learn that you can do simple raw-text processing with XSLT by loading a text file and analyzing the content to find markers that you can use to construct XML elements or attributes.

In this chapter you will do the following:

  • Learn what to expect in default whitespace character processing.

  • Use XSLT declarations to manage, strip, or preserve whitespace in output.

  • Make use of the <xsl:text> instruction to create a CSV file that can be read by a spreadsheet.

  • Load and parse regular expressions in CSV data to create XML markup.

  • Compare transforming CSV content with XSLT to alternatives available in a spreadsheet.

Controlling Whitespace

Whitespace-only text nodes consist entirely of any sequence of the four characters tab, newline, carriage return, or space. In XML, whitespace in element-only (or empty) elements is not considered significant. ...

Get Beginning XSLT and XPath: Transforming XML Documents and Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.