trying to extract image url from an xml document (rss feed)

bottleweb · June 2, 2008

Hello, I'm currently building an RSS feed agregator in php. I am able to obtain, store and display the title, descirption, etc. but haven't been able to obtain the image url from the xml files (it's stored differently to description/title in the structure, see below). Here is the table structure of a common rss xml file (this one is from yahoo):

<item>
    <title>story title</title> 
    <link>http://story_url</link> 
    <guid isPermaLink="false">/link/link2</guid> 
    <source>AP</source> 
    <pubDate>Mon, 02 Jun 2008 11:20:14 GMT</pubDate> 
    <description>description goes here</description> 
    <media:content url="http://imageurl.jpg" type="image/jpeg" height="130" width="111" /> 
    <media:text type="html"><p><a href="linkurl"><img src="http://imageurl.jpg" align="left" height="130" width="111" alt="photo" title="description" border="0"/></a></p><br clear="all"/></media:text> 
    <media:credit role="publishing company">(AP)</media:credit> 
</item>

What i'm interested in is the media:content 'url' variable, which stores the image URL. So far I've got the following code (works fine for title, description and URL):

foreach ($xml->item as $item) {
        
        $rss_feed_title = trim(strval($item->title));
        $rss_feed_url = trim(strval($item->link));
        $rss_feed_description = trim(strval($item->description));
        
    //try to obtain the image as well (this doesn't work)
    $rss_feed_image = trim(strval($item->media['url']));
             
             //insert code here to store the values in a database
}

The problem is this line:

$rss_feed_image = trim(strval($item->media['url']));

which doesn't work. I'm hoping someone can tell me what I'm doing wrong here. The reason I can't get this particular one is because I can only call simple <blah>stuff i need</blah> tags, but this image type is <media:content url="stuff I need">. I'm sure there's a simple solution for this I'm just not very good with this html/css stuff.

bottleweb · June 2, 2008

Still haven't found a solution to this. surely someone can help me?

bottleweb · June 2, 2008

I should clarify that I'm using:

$xml = simplexml_load_string($received_rss_feeds) to get the xml array in the first place.

discomatt · June 2, 2008

You could always use regex

<?php

$string = <<<EOT
<item>
    <title>story title</title> 
    <link>http://story_url</link> 
    <guid isPermaLink="false">/link/link2</guid> 
    <source>AP</source> 
    <pubDate>Mon, 02 Jun 2008 11:20:14 GMT</pubDate> 
    <description>description goes here</description> 
    <media:content url="http://imageurl.jpg" type="image/jpeg" height="130" width="111" /> 
    <media:text type="html"><p><a href="linkurl"><img src="http://imageurl.jpg" align="left" height="130" width="111" alt="photo" title="description" border="0"/></a></p><br clear="all"/></media:text> 
    <media:credit role="publishing company">(AP)</media:credit> 
</item>
EOT;

function getMedia ( $type, $xml ) {

$arr = array();

if (  preg_match( '%<media:'.$type.'([^>]++)>%', $xml, $data ) == 0  )
	return FALSE;
if (  preg_match_all( '%([a-z]++)="([^"]++)"%', $data[1], $attribs, PREG_SET_ORDER ) == 0  )
	return 0;

foreach( $attribs as $attrib )
	$arr[ $attrib[1] ] = $attrib[2];

return $arr;

}

$arr = getMedia( 'content', $string );

print_r( $arr );

?>

getMedia will return FALSE if the <media:type> does not exist, 0 if the <media:type> has no attributes, and an associative array of attributes if it does

Sign In

trying to extract image url from an xml document (rss feed)

Recommended Posts

bottleweb

Link to comment

Share on other sites

bottleweb

Link to comment

Share on other sites

bottleweb

Link to comment

Share on other sites

discomatt

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information