Parsing XML

Say you have a set of XML files, each containing information about a book, and you want to build an index showing the document title and its author for the collection. You need to parse the XML files to recognize the title and author elements and their contents. You could do this by hand with regular expressions and string functions such as strtok(), but it’s a lot more complex than it seems. In addition, such methods are prone to breakage even with valid XML documents. The easiest and quickest solution is to use one of the XML parsers that ship with PHP.

PHP includes three XML parsers: one event-driven library based on the expat C library, one DOM-based library, and one for parsing simple XML documents named, appropriately, SimpleXML.

The most commonly used parser is the event-based library, which lets you parse but not validate XML documents. This means you can find out which XML tags are present and what they surround, but you can’t find out if they’re the right XML tags in the right structure for this type of document. In practice, this isn’t generally a big problem.

PHP’s event-based XML parser calls various handler functions you provide while it reads the document as it encounters certain “events,” such as the beginning or end of an element.

In the following sections, we discuss the handlers you can provide, the functions to set the handlers, and the events that trigger the calls to those handlers. We also provide sample functions for creating a parser to generate ...

Get Programming PHP, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.