Categories
PHP

Parse Feeds URL and XML Files

This tutorial describes how to parse an XML document or RSS feed that follows a known schema.

We use the simplexml_load_file to load XML data from a file or a URL.

  1. Using SimpleXML
  2. Extract RSS feed from a URL
  3. Parsing an XML File
  4. Handling Namespace Prefixes

Using SimpleXML

One of the greatest features of PHP is the SimpleXML extension. The approach is as simple as it is ingenious. You can access XML via an object-oriented programming (OOP) approach: Subnodes are properties of their parent nodes/objects, and XML attributes turn into object attributes. This makes accessing XML very easy, including full iterator support, so foreach loop can be used.

Example: Using simplexml_load_file

<?php
 $xml = simplexml_load_file('sample.xml');
 foreach ($xml as $user) {
  echo $user->name  . ', ' .
       $user->email . '<br>';
 }
 /*Prints:
 BrainBell.com, admin@brainbell.com
 Fast-Tutorials.com, admin-fast-tutrials@outlook.com
 */

This code loads a file using simplexml_load_file() you can also use simplexml_load_string() for strings and then reads all information in.

Example: Using simplexml_load_string

<?php
 $str = '<?xml version="1.0"?>
<users>
 <user>
  <name>BrainBell.com</name>
  <email>admin@brainbell.com</email>
 </user>
 <user>
  <name>Fast-Tutorials.com</name>
  <email>admin-fast-tutrials@outlook.com</email>
 </user>
</users>';

 $xml = simplexml_load_string($str);
 foreach ($xml as $user) {
  echo $user->name  . ', ' .
       $user->email . '<br>';
 }
 /*Prints:
 BrainBell.com, admin@brainbell.com
 Fast-Tutorials.com, admin-fast-tutrials@outlook.com
 */

Extract RSS feed from a URL

The simplexml_load_file function interprets an XML file into an object. We can directly iterate over SimpleXML’s elements/items using the foreach loop.

Example: Read RSS Feed from a URL

<?php
 $url = 'https://brainbell.com/feed';
 $rss = simplexml_load_file($url) ;
 foreach ($rss->channel as $channel){
  foreach ($channel->item as $item){
   echo $item->link.'<br>';
   echo $item->title.'<br>';
   echo $item->description.'<br>';
   echo $item->pubDate.'<hr>';
  }
 }

Parsing an XML File

SimpleXML is also useful to read a configuration file written in XML or process the result of a REST request.

Example: Reading an XML File.

<?php
 $books = simplexml_load_file('bookstore.xml');
 foreach ($books as $book) {
  echo $book->name.'<br>';
  echo $book->price.' $<br>';
  echo $book->date.'<br>';
  echo '<b>Authors:</b><br>';
  foreach ($book->author as $author){
   echo $author->firstName . ' '.
        $author->lastName.'<br>';
  }
  echo '<hr>';
 }

Sample XML file to parse:

<?xml version="1.0"?>
<books>
 <book>
  <name>PHP Book</name>
  <date>2016-09-24</date>
  <price>5.00</price>
  <author>
   <firstName>Abc</firstName>
   <lastName>Def</lastName>
  </author>
  <author>
   <firstName>Admin</firstName>
   <lastName>BrainBell</lastName>
  </author>
 </book>
 <book>
  <name>XML Book</name>
  <date>2023-04-24</date>
  <price>10.00</price>
  <author>
   <firstName>Abc</firstName>
   <lastName>Def</lastName>
  </author>
 </book>
</books>

Handling Namespace Prefixes

Sometimes you come across an XML document that uses namespace prefixes, SimpleXML extension doesn’t treat namespace prefixes in the same way as tags without a prefix.

For example, the elements that use a namespace prefix, such as <dc:creator>, need to be handled differently.

<?php
 //Extract RSS feed from a URL
 //...
  foreach ($channel->item as $item){
   echo $item->dc:creator . '<br>';
  }
 //Generates error

The $item->dc:creator code prints the “Parse error: syntax error, unexpected token “:”, expecting “,” or “;” in…” on the browser window, and when you removed the namespace prefix like this: echo $item->creator, it not prints the value.

Example: Extract the value of the namespaced elements

To access elements that have a namespace prefix, pass the two arguments to the children() method: the namespace prefix and a Boolean true to indicate that the first argument is a prefix:

<?php
 $url = 'https://brainbell.com/feed';
 $rss = simplexml_load_file($url) ;
 foreach ($rss->channel as $channel){
  foreach ($channel->item as $item){
   echo $item->link.'<br>';
   echo $item->title.'<br>';
   echo $item->description.'<br>';

   $dc = $item->children('dc', true);
   echo $dc->creator.'<hr>';

  }
 }

Using XML: