Jump to content

php rss parser


jcombs_31

Recommended Posts

magpie is excellent, just started playing with it yesterday myself:

http://magpierss.sourceforge.net/

 

So you dont have to read too much, I'll give you a quick rundown.

 

place rss_fetch.inc, rss_parse.inc, rss_cache.inc, rss_utils.inc, and the extlib folder into your include dir.

 

include rss_fetch.inc in your script:

require_once("include/rss_fetch.inc");

 

set ur url:

$url = "http://somesite.com/news.rss";

 

create rss object:

$rss = fetch_rss($url);

 

echo "<pre>";

print_r['$rss'];

echo "</pre>";

 

that'll show you how the items get stored in the rss object. You can loop thru and display them any way you want.

 

hope this helps

 

 

Link to comment
Share on other sites

Thanks, does this have the ability to display some data in the feed rather than just a link?

 

I used the code in the readme file

 

<?php 
require_once('includes/rss_fetch.inc');
$url = "http://www.lockergnome.com/rss/web.php";
$rss = fetch_rss( $url );

echo "Channel Title: " . $rss->channel['title'] . "<p>";
echo "<ul>";
foreach ($rss->items as $item) {
 $href = $item['link'];
 $title = $item['title'];
 echo "<li><a href=$href>$title</a></li>";
}
echo "</ul>";

?>

 

and get the links to a feed, but then have a cache error

 

"Cache couldn't make dir './cache'. Cache unable to open file for writing: ./cache\039d3bbd5586f0089b4edc1d921dbe72"

Link to comment
Share on other sites

I looked at Magpie and decided it was a bit heavier than what I wanted. I opted instead to use this class: LastRSS.

 

It probably works in a similar manner to magpie. Here's some sample code that I used, to give you an idea.

 

$rss = new lastRSS();
if ($rs = $rss->Get('http://... put url of xml feed here'')) {
$dbh = mysql_connect($DBHOST, $DBUSER, $DBPWD) or die();
mysql_select_db($DBDB);
if ($rs['items_count'] > 0) {
 for ($i=$rs['items_count']-1; $i--; $i >= 0) {
 	$item = $rs['items'][$i];
 	foreach ($item as $key => $value)
   $item[$key] = addslashes($value);
 	//YYYY-MM-DD HH:MM:SS (MYSQL FORMAT)
 	$pubdate = date('Y-m-d h:i:s', strtotime($item['pubDate']));
 	if (!empty($item['guid'])) {
   $sql = "SELECT COUNT(*) AS countof FROM mg_news WHERE guid = '$item[guid]'";
 	} else {
   $sql = "SELECT COUNT(*) AS countof FROM mg_news WHERE url = '$item[link]'";
 	}
 	echo "$sql<br \>";
 	$rslt = mysql_query($sql, $dbh) or die(MYSQL_ERROR());
 	$row = mysql_fetch_assoc($rslt);
 	//Tue, 14 Sep 2004 22:21:39
 	if ($row['countof'] == 0) {  	
   $sql = 'INSERT INTO mg_news (mg_newssource_id, int_state, guid, date, subject, news, url) VALUES (';
   $sql .= "$nsid, $defstate, '$item[guid]', '$pubdate', '$item[title]', '$item[description]', '$item[link]')";
   $rslt = mysql_query($sql, $dbh);
   echo "$sql<br \>";
 	} else {
   echo 'Guid or Link exists, skipping<br \>';
 	} 
 }
}
} else {
echo 'Uh oh, didn\'t work';
}

 

As you should note from my example, the entire purpose of my use here was to get the information into a database table. There are a lot of alternative methods to doing that, which includes the caching to file method.

 

I don't worry about that because I control when this script goes out to the the site and pulls information via cron. They have simpler examples in their documentation.

 

The problem with your writing your cache file is probably a permissions issue. Remember that the webserver process is the one that is writing the file to disk, so you need to make sure that the permissions on the cache directory are such that the webserver user has read/write to that directory.

 

Link to comment
Share on other sites

I looked at Magpie and decided it was a bit heavier than what I wanted.  I opted instead to use this class:  LastRSS.

 

It probably works in a similar manner to magpie.  Here's some sample code that I used, to give you an idea.

 

$rss = new lastRSS();
if ($rs = $rss->Get('http://... put url of xml feed here'')) {
$dbh = mysql_connect($DBHOST, $DBUSER, $DBPWD) or die();
mysql_select_db($DBDB);
if ($rs['items_count'] > 0) {
 for ($i=$rs['items_count']-1; $i--; $i >= 0) {
 	$item = $rs['items'][$i];
 	foreach ($item as $key => $value)
   $item[$key] = addslashes($value);
 	//YYYY-MM-DD HH:MM:SS (MYSQL FORMAT)
 	$pubdate = date('Y-m-d h:i:s', strtotime($item['pubDate']));
 	if (!empty($item['guid'])) {
   $sql = "SELECT COUNT(*) AS countof FROM mg_news WHERE guid = '$item[guid]'";
 	} else {
   $sql = "SELECT COUNT(*) AS countof FROM mg_news WHERE url = '$item[link]'";
 	}
 	echo "$sql<br \>";
 	$rslt = mysql_query($sql, $dbh) or die(MYSQL_ERROR());
 	$row = mysql_fetch_assoc($rslt);
 	//Tue, 14 Sep 2004 22:21:39
 	if ($row['countof'] == 0) {  	
   $sql = 'INSERT INTO mg_news (mg_newssource_id, int_state, guid, date, subject, news, url) VALUES (';
   $sql .= "$nsid, $defstate, '$item[guid]', '$pubdate', '$item[title]', '$item[description]', '$item[link]')";
   $rslt = mysql_query($sql, $dbh);
   echo "$sql<br \>";
 	} else {
   echo 'Guid or Link exists, skipping<br \>';
 	} 
 }
}
} else {
echo 'Uh oh, didn\'t work';
}

 

As you should note from my example, the entire purpose of my use here was to get the information into a database table.  There are a lot of alternative methods to doing that, which includes the caching to file method.

 

I don't worry about that because I control when this script goes out to the the site and pulls information via cron.  They have simpler examples in their documentation.

 

The problem with your writing your cache file is probably a permissions issue.  Remember that the webserver process is the one that is writing the file to disk, so you need to make sure that the permissions on the cache directory are such that the webserver user has read/write to that directory.

183872[/snapback]

 

Thanks, I fixed the permissions issue, but still I want to have a script that gives glipse at the content of the feed, not just the link. I have much smaller script than magpe that does the same thing.

 

Link to comment
Share on other sites

A couple other things so you don't get confused by my code.

 

- I don't want to insert duplicate new stories in my news table, so I avoid that using either th guid or the link. When I wrote this the assumption was that I would have multiple news sources so I wasn't sure if they would all support a guid (many don't) which is the equivalent of a unique url.

 

 

So it should be pretty obvious that the class reads all the items and creates an array of them. I loop through them and inside the loop assign a temp variable item that is just one of the rss entries.

$item = $rs['items'][$i];

 

The important line to note is this one:

 

$sql .= "$nsid, $defstate, '$item[guid]', '$pubdate', '$item[title]', '$item[description]', '$item[link]')";

 

Here you can see the class item names:

 

$item['guid'] -> the unique url

$item['pubDate'] -> The publish date.

 

I have some date manipulation code in there you might find of interest, so i can convert it into a date mysql is happy with.

 

$item['title'] -> The Title

$item['link'] -> The link

$item['description'] -> The text abstract you are looking for (usually the first n lines)

 

Link to comment
Share on other sites

Oh yeah, in case you were wondering, you can see this in action at http://www.movie-gurus.com/ in that the News section is the data pulled from the feeds.  You have to click on a news item to get to the abstract view for a story, but that small paragraph is what gets pulled in in the description.

183876[/snapback]

 

I didn't realize I could just add an $item['description']. Now I'm gettin where I wanna be.

 

thanks for your feedback.

Link to comment
Share on other sites

cool, glad your getting there...

 

some feeds dont offer descriptions, most do though. Just incase the feed doesn't have a description, you could test for it with a line like:

 

$desc = isset($item['description']) ? $item['description'] : "";

 

so if there is a description available, it'll be used, otherwise prints nothing.

 

Like i said, I just started playing with syndications yesterday, and I found magpie tutorials all over the internet, even in a book i had around the house. I like it so far, but the comment about using a script to grab all the headlines and database them via cron sound's very interesting. Solves bandwidth issues altogether, however, it kills the functionality of being able the get the news the minute it's posted. Anyway, just my thoughts

Link to comment
Share on other sites

Hi -

 

I'm trying to get a feed from the BBC Sports website (just as a test). Code is below:

 

<?
require_once("../inc/rss_fetch.inc");
$url = "http://news.bbc.co.uk/rss/sportonline_uk_edition/front_page/rss091.xml";
$rss = fetch_rss( $url );

echo "Channel Title: " . $rss->channel['title'] . "<p>";
echo "<ul>";
foreach ($rss->items as $item) {
$href = $item['link'];
$title = $item['title'];
echo "<li><a href=$href>$title</a></li>";
}
echo "</ul>";

?>

 

However, I´m getting the following error:

 

Warning: MagpieRSS: Failed to fetch http://news.bbc.co.uk/rss/sportonline_uk_e...page/rss091.xml. (HTTP Error: connection failed (11) in /inc/rss_fetch.inc on line 237

Channel Title:

 

 

Warning: Invalid argument supplied for foreach() in /news/news.php on line 20

 

I've tried with several RSS feeds but get the same error every time. Anyone know why?

 

Rgds,

 

Neil.

Link to comment
Share on other sites

  • 1 year later...

Hi Niel,

 

I'm getting the same error on my site. I just found this from the magpie faq:

 

[!--quoteo--][div class=\'quotetop\']QUOTE[/div][div class=\'quotemain\'][!--quotec--]4. Error: MagpieRSS: Failed to fetch [a href=\"http://example.com/rss.xml\" target=\"_blank\"]http://example.com/rss.xml[/a]. (HTTP Error: connection failed (1)

 

A connection error of type <b>1</b> means "permission denied". This usually means that your

ISP has configued PHP so that it can't open outgoing sockets (usually for security reasons).

 

The only solution to this is to ask your ISP for help.

 

Sometimes you'll also get the related `connection failed (11)` (e.g. on sourceforge.net)

which also means PHP is configured in such a way that Magpie can't work.

 

While this helps to know that it is a php configuration problem, it gives no clue on what needs to be changed to make it work.

 

This is frustrating :(

Link to comment
Share on other sites

Okay, I was able to dive into the problem a little bit deeper. If you go to the Snoopy class and search for fsockopen that is where the problem is occuring. Just do an echo there on the $errstr and you should see what kind of error you are getting. Mine is a "No route to host" problem.

Link to comment
Share on other sites

  • 4 months later...

[!--quoteo(post=184287:date=Dec 18 2004, 08:22 PM:name=nfr)--][div class=\'quotetop\']QUOTE(nfr @ Dec 18 2004, 08:22 PM) 184287[/snapback][/div][div class=\'quotemain\'][!--quotec--]

Hi -

 

I'm trying to get a feed from the BBC Sports website (just as a test). Code is below:

 

<?
require_once("../inc/rss_fetch.inc");
$url = "http://news.bbc.co.uk/rss/sportonline_uk_edition/front_page/rss091.xml";
$rss = fetch_rss( $url );

echo "Channel Title: " . $rss->channel['title'] . "<p>";
echo "<ul>";
foreach ($rss->items as $item) {
$href = $item['link'];
$title = $item['title'];
echo "<li><a href=$href>$title</a></li>";
}
echo "</ul>";

?>

 

However, I´m getting the following error:

 

Warning: MagpieRSS: Failed to fetch [a href=\"http://news.bbc.co.uk/rss/sportonline_uk_edition/front_page/rss091.xml\" target=\"_blank\"]http://news.bbc.co.uk/rss/sportonline_uk_e...page/rss091.xml[/a]. (HTTP Error: connection failed (11) in /inc/rss_fetch.inc on line 237

Channel Title:

Warning: Invalid argument supplied for foreach() in /news/news.php on line 20

 

I've tried with several RSS feeds but get the same error every time. Anyone know why?

 

Rgds,

 

Neil.

 

Hi All,

I am very new to the world of PHP and programming. I have been doing a few small things here and there. I tried the magpie RSS parser and I constantly run into the below problem. My code is same as the one posted here.

 

Warning: MagpieRSS: Failed to fetch [a href=\"http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml\" target=\"_blank\"]http://newsrss.bbc.co.uk/rss/newsonline_wo...nt_page/rss.xml[/a] (HTTP Error: connection failed (3) in \...\magpie\magpierss-0.72\rss_fetch.inc on line 238.

 

I am unable to find what the error connection failed (3) means.

Any help in this regard will be most helpful.

 

Regards,

Vivek C.A

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.