Parsing XML using SimpleXML might not seem so simple at first, but there is an easy way to get at all of the data you desire. The challenge is reading the PHP documentation for SimpleXML and in some cases finding clues in the comments. Namespaces are tricky for beginners, and CDATA sections might require some regular expressions to get at data.
Over the years I have had to parse XML numerous times. It’s something you will find yourself doing quite often because many web services serve XML data feeds. You might be requesting product data, jobs data, or any number of different types of data. A couple of years ago I decided to put together a script that shows how to parse nearly everything that could be in some XML, and it has proven to be a real time saver for me.
To demonstrate how easy parsing XML is, I’m first going to show you the XML we will be working with, then we will parse it.
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <rss version="2.0" xmlns:yweather="http://xml.weather.yahoo.com/ns/rss/1.0" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"> <channel> <title>Yahoo! Weather - Temecula, CA</title> <city data="Temecula, CA"/> <!-- yweather is a namespace --> <yweather:location city="Temecula" region="CA" /> <image> <title>Yahoo! Weather</title> <width>142</width> <height>18</height> <link>http://weather.yahoo.com</link> <url>http://l.yimg.com/a/i/brand/purplelogo//uh/us/news-wea.gif</url> </image> <item> <description> <![CDATA[ <img src="http://l.yimg.com/a/i/us/we/52/26.gif"/><br /> <b>Current Conditions:</b><br /> Cloudy, 63 F<BR /> <BR /><b>Forecast:</b><BR /> Wed - Partly Cloudy. High: 77 Low: 57<br /> Thu - Partly Cloudy. High: 78 Low: 59 ]]> </description> </item> </channel> </rss>
The XML above is a partial Yahoo! weather feed. I removed a lot of the elements to save space. You can see right away that it has a namespace named yweather. It also has a description, which is HTML in a CDATA section. The CDATA section allows the HTML to be inside the XML without messing up the XML parser.
When the XML is in a response from another server, you’ll normally load it as a string. When the XML is in a file, you load it as a file. There are at least a few ways to load your XML, but I use simplexml_load_string and simplexml_load_file.
// XML loaded as a string if( $feed = @simplexml_load_string( $string ) ) { /** * If the parser didn't complain about malformed XML, * then this is where you will do your work. */ } // XML loaded as a file if( $feed = @simplexml_load_file('file.xml') ) { /** * If the parser didn't complain about malformed XML, * then this is where you will do your work. */ }
$feed is now our SimpleXMLElement object, and we will use that object to get the data we need. To start out very basic, lets get the title (line 5). $feed happens to be at the position of the rss element, so keep that in mind.
// title is inside channel which is inside rss ($feed) echo $feed->channel->title;
On line 6 we have an element named city, but the city name is actually inside an attribute named data. Why Yahoo! decided to do this I don’t know, but lets get the city:
/** * This is one way to get data from an attribute, * but it doesn't always work. I can't tell you why. */ echo $feed->channel->city['data']; /** * Try this if necessary. It seems to work all the time. * It just takes more typing. No big deal, right? */ echo $feed->channel->city->attributes()->data;
On line 8 you can see that there is another place in the XML to get the city name, but the city name here is namespaced. Again, why would Yahoo! do this? It doesn’t really matter why if we need to get at it. Don’t worry, it’s not very hard to deal with namespaces.
// Easy way to always get all namespaces in advance $namespaces = $feed->getNamespaces(true); // Apply the yweather namespace to a variable $yweather = $feed->channel->children( $namespaces['yweather'] ); /** * Finally, get the city name. * Notice we are not using $feed. */ echo $yweather->location->attributes()->city;
SimpleXML objects can be looped through like array. Consider all of the attributes in the yweather namespaced location element.
// Loop through the location element's attributes foreach( $yweather->location->attributes() as $attr => $val ) { echo $attr . ' = ' . $val . '
'; }
On line 9 through 15 we have an image element with some children. What if we wanted to convert the data from the children into an array?
// Get all children of an element $children = $feed->channel->image->children(); /** * Create an array using the element names as keys * and the element contents as values */ foreach( $children as $element => $contents ) { $image[$element] = $contents; }
Line 17 starts a description element. How do we get the image location out of that? We can’t parse it with SimpleXML because the contents of the element are all wrapped in a CDATA section. We’re going to need to use PHP’s preg_match function to find what we’re looking for.
// Get the image out description preg_match( '/http:.*gif/', $feed->channel->item->description, $matches ); $the_image = $matches[0];
This isn’t all there is to XML parsing, but this will get you through most jobs. Take some time to read through the SimpleXML documentation on php.net, and I’m sure you’ll be confident that you can parse any XML you come across.
Russell W says:
Thank you! The part about namespaces was just what I needed to find.
Dan M. says:
Thanks for namespaces advice!