Jump to content

trying to extract image url from an xml document (rss feed)


bottleweb

Recommended Posts

Hello, I'm currently building an RSS feed agregator in php. I am able to obtain, store and display the title, descirption, etc. but haven't been able to obtain the image url from the xml files (it's stored differently to description/title in the structure, see below). Here is the table structure of a common rss xml file (this one is from yahoo):

 

<item>
    <title>story title</title> 
    <link>http://story_url</link> 
    <guid isPermaLink="false">/link/link2</guid> 
    <source>AP</source> 
    <pubDate>Mon, 02 Jun 2008 11:20:14 GMT</pubDate> 
    <description>description goes here</description> 
    <media:content url="http://imageurl.jpg" type="image/jpeg" height="130" width="111" /> 
    <media:text type="html"><p><a href="linkurl"><img src="http://imageurl.jpg" align="left" height="130" width="111" alt="photo" title="description" border="0"/></a></p><br clear="all"/></media:text> 
    <media:credit role="publishing company">(AP)</media:credit> 
</item>

 

What i'm interested in is the media:content 'url' variable, which stores the image URL. So far I've got the following code (works fine for title, description and URL):

 

foreach ($xml->item as $item) {
        
        $rss_feed_title = trim(strval($item->title));
        $rss_feed_url = trim(strval($item->link));
        $rss_feed_description = trim(strval($item->description));
        
    //try to obtain the image as well (this doesn't work)
    $rss_feed_image = trim(strval($item->media['url']));
             
             //insert code here to store the values in a database
}

 

The problem is this line:

$rss_feed_image = trim(strval($item->media['url']));

which doesn't work. I'm hoping someone can tell me what I'm doing wrong here. The reason I can't get this particular one is because I can only call simple <blah>stuff i need</blah> tags, but this image type is <media:content url="stuff I need">. I'm sure there's a simple solution for this I'm just not very good with this html/css stuff.

You could always use regex

 

<?php

$string = <<<EOT
<item>
    <title>story title</title> 
    <link>http://story_url</link> 
    <guid isPermaLink="false">/link/link2</guid> 
    <source>AP</source> 
    <pubDate>Mon, 02 Jun 2008 11:20:14 GMT</pubDate> 
    <description>description goes here</description> 
    <media:content url="http://imageurl.jpg" type="image/jpeg" height="130" width="111" /> 
    <media:text type="html"><p><a href="linkurl"><img src="http://imageurl.jpg" align="left" height="130" width="111" alt="photo" title="description" border="0"/></a></p><br clear="all"/></media:text> 
    <media:credit role="publishing company">(AP)</media:credit> 
</item>
EOT;

function getMedia ( $type, $xml ) {

$arr = array();

if (  preg_match( '%<media:'.$type.'([^>]++)>%', $xml, $data ) == 0  )
	return FALSE;
if (  preg_match_all( '%([a-z]++)="([^"]++)"%', $data[1], $attribs, PREG_SET_ORDER ) == 0  )
	return 0;

foreach( $attribs as $attrib )
	$arr[ $attrib[1] ] = $attrib[2];

return $arr;

}

$arr = getMedia( 'content', $string );

print_r( $arr );

?>

 

getMedia will return FALSE if the <media:type> does not exist, 0 if the <media:type> has no attributes, and an associative array of attributes if it does

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.