Categories
PHP

Parsing XML with SAX

With SAX Parser, you create functions to deal with specific events, for example, the start or end of an XML element (tag). When the parser encounters an XML element, it calls the event handler functions to process that element.

SAX is an approach to parse XML documents, but not to validate them. The xml_parser_create() function returns a SAX (Simple API for XML) parser.

This parser can look at an XML file and react upon various events. The following three events are the most important ones:

  • Beginning of an element (tag)
  • End of an element (tag)
  • CDATA blocks (string/text blocks)

To use the SAX parser:

  1. Use xml_parser_create() to create a parser.
  2. Set handlers on the parser
    • The function xml_set_element_handler() sets the handlers for the beginning and end of an element.
    • The function xml_set_character_data_handler() sets the handler for CDATA blocks.
  3. Set options on the parser, use xml_parser_set_option() function to configure the handlers to ignore whitespace and to handle tag names as case-sensitive.
  4. Use xml_parse() function to parse XML data
  5. Call xml_parser_free() once the parsing is complete.
<?php
 // 1. Creating a parser
 $sax = xml_parser_create();

 // 2. Setting options
 xml_parser_set_option($sax, XML_OPTION_CASE_FOLDING, false);
 xml_parser_set_option($sax, XML_OPTION_SKIP_WHITE, true);

 // 3. Setting callable handler functions
 xml_set_element_handler($sax, 'start_tag_handler', 'end_tag_handler');
 xml_set_character_data_handler($sax, 'cdata_handler');

 // 4. Parsing XML file
 xml_parse($sax, file_get_contents('quotes.xml'), true);

 // 5. Free the parser
 xml_parser_free($sax);

 // The following code contains the handler functions:
 function start_tag_handler($sax, $tag, $attr) {
  switch ($tag) {
   case 'quotes' :
    echo '<h1>Quotes</h1><ol>';
    break;
   case 'quote' : 
    echo '<li><b>'.$attr['year'].':</b>';
    break;
   case 'author' :
    echo '<br><i>';
  }
 }
 function end_tag_handler($sax, $tag) {
  switch ($tag) {
   case 'quotes' : echo '</ol>';
    break;
   case 'quote'  : echo '</li>';
    break;
   case 'author' : echo '</i>';
  }
 }
 function cdata_handler($sax, $data) {
  echo htmlspecialchars($data);
 }

The XML file parsed in the above code:

<?xml version="1.0"?>
<quotes>
  <quote year="2023">
    <coding>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua...</coding>
    <author>Author XYZ</author>
  </quote>
  <quote year="2022">
    <coding>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua...</coding>
    <author>Author ABC</author>
  </quote>
</quotes>

HTML created from XML

For more, visit https://php.net/manual/book.xml.php.


Using XML: