Jump to content

Screen Scrape original content from RSS


kailash001

Recommended Posts

Hello guys, i'm trying to screen scrape the original content from every RSS feed. The RSS feeds works fine however when i try to screen scrape every content using the library simple html dom. At first it works fine but when it tries to extract the second  feed's original content then i get this error:

 

Fatal error: Cannot redeclare file_get_html() (previously declared in C:\wamp\www\mashup\protected\views\articles\simple_html_dom.php:37) in C:\wamp\www\mashup\protected\views\articles\simple_html_dom.php on line 41

part of my code is as follows:

 

foreach($RSS_DOC->channel->item as $RSSitem)
{

	$item_id 	= md5($RSSitem->title);
	$item_title = $RSSitem->title;
	$item_date  = date("Y-m-j G:i:s", strtotime($RSSitem->pubDate));
	$item_url	= $RSSitem->link;

	echo "Processing item '" , $item_id , "<br/>";
	echo $item_title, " - ";
	echo $item_date, "<br/>";
	echo $item_url, "<br/>";

	//screen scrape original article
	include('simple_html_dom.php');
	$html = file_get_dom($item_url);  
	foreach($html->find('td[class=rel_headline_cmt]') as $element)
	{
		echo $element;
	}
}

Any help with this?

Link to comment
https://forums.phpfreaks.com/topic/222665-screen-scrape-original-content-from-rss/
Share on other sites

Move the line      include('simple_html_dom.php'); outside the foreach loop. You don't need to include the file at every iteration.

 

Thanx for the help. But now i'm getting another problem. i'm able to extract the 1st article properly but when it extracts the 2nd one it extract it twice then the 3rd one once and then i get this error:

 

Fatal error: Maximum execution time of 60 seconds exceeded in C:\wamp\www\mashup\protected\views\articles\simple_html_dom.php on line 70

can you tell me how can i make the script run faster? or any other solution?

It's hard to say where your problem is. But it's definately some loop problem. One thing that caatched my eye is this:

foreach($RSS_DOC->channel->item as $RSSitem)

{

 

Do you really need to loop trough one item? ($RSS_DOC->channel->item) Maybe loop trough the channel only?

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.