[SOLVED] Not just reading a RSS feed but also grabbing the linked article too?

whatnow · November 10, 2007

I have the following code which grabs a RSS feed, it shows up the results in my browser as:

Title: Genzyme stumps up £7m for NCG partnership
Description: Genzyme UK has entered into a partnership with the NHS National Commissioning Group, under which it will provide funding of £7 million over three years to help support the care of patients with lysosomal storage disorders.

Link: http://www.pharmatimes.com/WorldNews/article.aspx?id=12177&src=WorldNewsRSS

Now how would I go about actually making it read the RSS, take on board the URL for the full article and then additionally grabbing the artical from the webpage, as to result in:

Title: Genzyme stumps up £7m for NCG partnership
Description: Genzyme UK has entered into a partnership with the NHS National Commissioning Group, under which it will provide funding of £7 million over three years to help support the care of patients with lysosomal storage disorders.

Main Article:

Genzyme UK has entered into a partnership with the NHS National Commissioning Group, under which it will provide funding of £7 million over three years to help support the care of patients with lysosomal storage disorders.

LSDs are rare and often severe metabolic disorders – such as Gaucher, Fabry and Pompe diseases - that need specialist and multi....(etc)

Link: http://www.pharmatimes.com/WorldNews/article.aspx?id=12177&src=WorldNewsRSS

The page itself has the content in a span class; named 'newsContent'. Do I just need to make a code which just lifts this span out of the page? That seems like a inefficient method of what I want to achieve, when ideally I could just call the content in another way? Are there other ways, or is a crude method the onyl way to take content from other sites like this? ( I will happily admit, I am fresh to RSS )

I've been searching the internet for this for three hours now and to be honest i'm not getting good results. I've read about bloggers stealing content so it seems possible but i've not found any practical code for doing just that.

I assure you this isn't for illicit gains, I've been asked to do it for a job interview i'm preparing for, so naturally any help would be more than appreciated.

Code:

<?php

$rssFeeds = array ('http://www.pharmatimes.com/p.aspx?n=ZGFpbHl2aWRlb25ld3M=&s=VmlkZW9OZXdz');

//Loop through the array, reading the feeds one by one
foreach ($rssFeeds as $feed) {
  readFeeds($feed);
}


function startElement($xp,$name,$attributes) { 
global $item,$currentElement;  
$currentElement = $name; 
//the other functions will always know which element we're parsing  
if ($currentElement == 'ITEM') { 
//by default PHP converts everything to uppercase    
$item = true; 
// We're only interested in the contents of the item element. This flag keeps track of where we are  
}}

function endElement($xp,$name) {  
global $item,$currentElement,$title,$description,$link;    
if ($name == 'ITEM') { 
// If we're at the end of the item element, display 
// the data, and reset the globals    
echo "<b>Title:</b> $title<br>";    
echo "<b>Description:</b> $description<br>";    
echo "<b>Link:</b> $link<br><br>";    
$title = '';    
$description = '';    
$link = '';    
$item = false;  
}}

function characterDataHandler($xp,$data) {  
global $item,$currentElement,$title,$description,$link;    
if ($item) { 
//Only add to the globals if we're inside an item element.    
switch($currentElement) {      
case "TITLE":        
$title .= $data; 
// We use .= because this function may be called multiple times for one element.        
break;      
case "DESCRIPTION":        
$description.=$data;        
break;      
case "LINK":        
$link.=$data;        
break;     }}  }}





function readFeeds($feed) {
  $fh = fopen($feed,'r'); 
// open file for reading

  $xp = xml_parser_create(); 
// Create an XML parser resource

  xml_set_element_handler($xp, "startElement", "endElement"); 
// defines which functions to call when element started/ended

  xml_set_character_data_handler($xp, "characterDataHandler");

  while ($data = fread($fh, 4096)) {
    if (!xml_parse($xp,$data)) {
      return 'Error in the feed';
    }
  }
}
?>

whatnow · November 10, 2007

never mind, I've found a dirty hack which will do the job.

something like this;

<?php

$config['url']       = "http://www.pharmatimes.com/WorldNews/article.aspx?id=12190"; // url of html to grab
$config['start_tag'] = "<body>"; // where you want to start grabbing
$config['end_tag']   = "</body>"; // where you want to stop grabbing
$config['show_tags'] = 1; // do you want the tags to be shown when you show the html? 1 = yes, 0 = no

class grabber
{
var $error = '';
var $html  = '';

function grabhtml( $url, $start, $end )
{
	$file = file_get_contents( $url );

	if( $file )
	{
		if( preg_match_all( "#$start(.*?)$end#s", $file, $match ) )
		{				
			$this->html = $match;
		}
		else
		{
			$this->error = "Tags cannot be found.";
		}
	}
	else
	{
		$this->error = "Site cannot be found!";
	}
}

function strip( $html, $show, $start, $end )
{
	if( !$show )
	{
		$html = str_replace( $start, "", $html );
		$html = str_replace( $end, "", $html );

		return $html;
	}
	else
	{
		return $html;
	}
}
}

$grab = new grabber;
$grab->grabhtml( $config['url'], $config['start_tag'], $config['end_tag'] );

echo $grab->error;

foreach( $grab->html[0] as $html )
{
$string1 = stristr( $grab->strip( $html, $config['show_tags'], $config['start_tag'], $config['end_tag'] ),'<span 

class="body">' ) . "<br>";
}



$string2 = (explode('span',$string1));
$string2 = $string2[1];
echo $string2;

?>

but it's not very clean, but it works, so horray for me.

Sign In

[SOLVED] Not just reading a RSS feed but also grabbing the linked article too?

Recommended Posts

whatnow

Link to comment

Share on other sites

whatnow

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information