Test XML Documents from the Command Line

A number of free, easy-to-use XML processors are available for use on the command line. This hack shows where to get four such tools and how to use them.

You can check XML documents for well-formedness and validity using tools on the command line or shell prompt. This hack discusses four tools: Richard Tobin’s RXP, Elcel’s XML Validator (xmlvalid), Daniel Veillard’s xmllint, and xmlwf (an application based on James Clark’s Expat C library).

RXP

You’ve already seen the online version of RXP [Hack #8] . This hack shows you how to use the command-line version, available free at http://www.cogsci.ed.ac.uk/~richard/rxp.html. For Windows and other platforms, you can download the C source and compile it yourself (ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp.tar.gz) or, if you are on Windows, you can simply download the executable rxp.exe (ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp.exe).

Once you’ve downloaded RXP and placed it in your path, you can check XML documents for well-formedness at a command prompt with this:

rxp time.xml

Upon success, this command will produce the output shown in Example 1-12.

Example 1-12. Output of RXP with time.xml

<?xml version="1.0" encoding="UTF-8"?>
<!-- a time instant -->
<time timezone="PST">
    <hour>11</hour>
    <minute>59</minute>
    <second>59</second>
    <meridiem>p.m.</meridiem>
    <atomic signal="true"/>
</time>

You can also check a document for validity by using the -V option, provided it has an accompanying DTD (as valid.xml does):

rxp -V valid.xml

When successful, you will see the output in Example 1-13.

Example 1-13. Output of RXP with valid.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE time SYSTEM "time.dtd">
<!-- a time instant -->
<time timezone="PST">
        <hour>11</hour>
        <minute>59</minute>
        <second>59</second>
        <meridiem>p.m.</meridiem>
        <atomic signal="true"/>
</time>

RXP has a number of other command options; for details, see ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp.txt. Also, there is a version of RXP that supports XML 1.1 (http://www.w3.org/TR/xml11/) and other up-and-coming specs. The source for this version of RXP is at ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp-1.4.0pre10.tar.gz, and a Windows executable is at ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp140pre4.exe.

xmlvalid

Elcel Technologies offers XML Validator (xmlvalid), a free command-line XML checker and validator (http://www.elcel.com/products/xmlvalid.html). You have to register to download the software. xmlvalid runs on Windows, Linux, Solaris, and other operating systems.

Once xmlvalid is downloaded, installed, and in the path, you can use it on the command line in this way to check a document for well-formedness (the -v switch means “don’t validate”):

xmlvalid -v time.xml

whereupon xmlvalid reports:

time.xml is well-formed

To check a document for validity, simply type:

xmlvalid valid.xml

and, assuming that the DTD is within reach of the processor, you get this response:

valid.xml is valid

For more command-line options, just type:

xmlvalid -h

Elcel also offers the C++ XML Toolkit (http://www.elcel.com/products/xmltoolkit.html) and OpenTop (http://www.elcel.com/products/opentop/index.html), a cross-platform C++ class library that is available under both commercial and free (GNU General Public License or GPL) licenses. The free version is available on Sourceforge (http://sourceforge.net/projects/open-top/).

xmllint

Another option for XML processing is xmllint, an application based on Daniel Veillard’s C library libxml2 (http://www.xmlsoft.org). xmllint comes with Cygwin and Red Hat, but can be downloaded separately along with the libxml2 library (http://xmlsoft.org/downloads.html). libxml2 is supported on Red Hat, Windows, Solaris, Max OS X, and HP-UX.

Assuming that xmllint is installed, you can type this command at a command prompt to check a document for well-formedness:

xmllint time.xml

time.xml is well-formed, so the result will be a copy of the document (Example 1-14).

Example 1-14. Output of xmllint with time.xml

<?xml version="1.0" encoding="UTF-8"?>
<!-- a time instant -->
<time timezone="PST">
 <hour>11</hour>
 <minute>59</minute>
 <second>59</second>
 <meridiem>p.m.</meridiem>
 <atomic signal="true"/>
</time>

You can also check a document for validity by using the --valid switch:

xmllint --valid valid.xml

If the command is successful, it yields Example 1-15.

Example 1-15. Output of xmllint with valid.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE time SYSTEM "time.dtd">
<!-- a time instant -->
<time timezone="PST">
 <hour>11</hour>
 <minute>59</minute>
 <second>59</second>
 <meridiem>p.m.</meridiem>
 <atomic signal="true"/>
</time>

xmllint has many other options, which you can find by typing only xmllint at a prompt. xmllib2 documentation is at http://xmlsoft.org/html/index.html. In addition to validating against a DTD, xmllint can also do validation against XML Schema [Hack #69] and RELAX NG [Hack #72] .

xmlwf

Another well-formedness checker is xmlwf , an application of the Expat C library for parsing XML (http://expat.sourceforge.net) that was originally written by James Clark. It comes with packages such as Cygwin on Windows and Red Hat Linux, you can also download it separately from Sourceforge as a Windows 32 executable (xmlwf.exe) and for other platforms.

With xmlwf installed and in the path, type:

xmlwf -v

xmlwf will report version and other information:

xmlwf using expat_1.95.7
sizeof(XML_Char)=1, sizeof(XML_LChar)=1, XML_DTD, XML_CONTEXT_BYTES=1024

Version 1.95.7 is the latest version as of this writing. To run xmlwf against a file, type this command:

xmlwf time.xml

If the file is well-formed, xmlwf is silent. However, if xmlwf finds a well-formedness error, it reports it and exits. For example, if you enter this line:

xmlwf bad.xml

xmlwf will report this error:

bad.xml:5:11: mismatched tag

This error message reports that on line 5, column 11 of bad.xml, xmlwf found a mismatched end tag (</howr>), which should have matched a previous start tag (<hour>).

Get XML Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.