12.4. Parsing XML with SAX

Problem

You want to parse an XML document and format it on an event basis, such as when the parser encounters a new opening or closing element tag. For instance, you want to turn an RSS feed into HTML.

Solution

Use the parsing functions in PHP’s XML extension:

$xml = xml_parser_create();
$obj = new Parser_Object;  // a class to assist with parsing

xml_set_object($xml,$obj);
xml_set_element_handler($xml, 'start_element', 'end_element');
xml_set_character_data_handler($xml, 'character_data');
xml_parser_set_option($xml, XML_OPTION_CASE_FOLDING, false);

$fp = fopen('data.xml', 'r') or die("Can't read XML data.");
while ($data = fread($fp, 4096)) {
  xml_parse($xml, $data, feof($fp)) or die("Can't parse XML data");
}       
fclose($fp);

xml_parser_free($xml);

Discussion

These XML parsing functions require the expat library. However, because Apache 1.3.7 and later is bundled with expat, this library is already installed on most machines. Therefore, PHP enables these functions by default, and you don’t need to explicitly configure PHP to support XML.

expat parses XML documents and allows you to configure the parser to call functions when it encounters different parts of the file, such as an opening or closing element tag or character data (the text between tags). Based on the tag name, you can then choose whether to format or ignore the data. This is known as event-based parsing and contrasts with DOM XML, which use a tree-based parser.

A popular API for event-based ...

Get PHP Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.