Jump to content

[SOLVED] Not just reading a RSS feed but also grabbing the linked article too?


whatnow

Recommended Posts

I have the following code which grabs a RSS feed, it shows up the results in my browser as:

 

Title: Genzyme stumps up £7m for NCG partnership

Description: Genzyme UK has entered into a partnership with the NHS National Commissioning Group, under which it will provide funding of £7 million over three years to help support the care of patients with lysosomal storage disorders.

Link: http://www.pharmatimes.com/WorldNews/article.aspx?id=12177&src=WorldNewsRSS

 

Now how would I go about actually making it read the RSS, take on board the URL for the full article and then additionally grabbing the artical from the webpage, as to result in:

 

Title: Genzyme stumps up £7m for NCG partnership

Description: Genzyme UK has entered into a partnership with the NHS National Commissioning Group, under which it will provide funding of £7 million over three years to help support the care of patients with lysosomal storage disorders.

Main Article:

Genzyme UK has entered into a partnership with the NHS National Commissioning Group, under which it will provide funding of £7 million over three years to help support the care of patients with lysosomal storage disorders.

 

LSDs are rare and often severe metabolic disorders – such as Gaucher, Fabry and Pompe diseases - that need specialist and multi....(etc)

Link: http://www.pharmatimes.com/WorldNews/article.aspx?id=12177&src=WorldNewsRSS

 

 

The page itself has the content in a span class; named 'newsContent'. Do I just need to make a code which just lifts this span out of the page? That seems like a inefficient method of what I want to achieve, when ideally I could just call the content in another way? Are there other ways, or is a crude method the onyl way to take content from other sites like this? ( I will happily admit, I am fresh to RSS )

 

I've been searching the internet for this for three hours now and to be honest i'm not getting good results. I've read about bloggers stealing content so it seems possible but i've not found any practical code for doing just that.

 

I assure you this isn't for illicit gains, I've been asked to do it for a job interview i'm preparing for, so naturally any help would be more than appreciated.

 

Code:

 

<?php

$rssFeeds = array ('http://www.pharmatimes.com/p.aspx?n=ZGFpbHl2aWRlb25ld3M=&s=VmlkZW9OZXdz');

//Loop through the array, reading the feeds one by one
foreach ($rssFeeds as $feed) {
  readFeeds($feed);
}


function startElement($xp,$name,$attributes) { 
global $item,$currentElement;  
$currentElement = $name; 
//the other functions will always know which element we're parsing  
if ($currentElement == 'ITEM') { 
//by default PHP converts everything to uppercase    
$item = true; 
// We're only interested in the contents of the item element. This flag keeps track of where we are  
}}

function endElement($xp,$name) {  
global $item,$currentElement,$title,$description,$link;    
if ($name == 'ITEM') { 
// If we're at the end of the item element, display 
// the data, and reset the globals    
echo "<b>Title:</b> $title<br>";    
echo "<b>Description:</b> $description<br>";    
echo "<b>Link:</b> $link<br><br>";    
$title = '';    
$description = '';    
$link = '';    
$item = false;  
}}

function characterDataHandler($xp,$data) {  
global $item,$currentElement,$title,$description,$link;    
if ($item) { 
//Only add to the globals if we're inside an item element.    
switch($currentElement) {      
case "TITLE":        
$title .= $data; 
// We use .= because this function may be called multiple times for one element.        
break;      
case "DESCRIPTION":        
$description.=$data;        
break;      
case "LINK":        
$link.=$data;        
break;     }}  }}





function readFeeds($feed) {
  $fh = fopen($feed,'r'); 
// open file for reading

  $xp = xml_parser_create(); 
// Create an XML parser resource

  xml_set_element_handler($xp, "startElement", "endElement"); 
// defines which functions to call when element started/ended

  xml_set_character_data_handler($xp, "characterDataHandler");

  while ($data = fread($fh, 4096)) {
    if (!xml_parse($xp,$data)) {
      return 'Error in the feed';
    }
  }
}
?>

Link to comment
Share on other sites

never mind, I've found a dirty hack which will do the job.

 

something like this;

 

<?php

$config['url']       = "http://www.pharmatimes.com/WorldNews/article.aspx?id=12190"; // url of html to grab
$config['start_tag'] = "<body>"; // where you want to start grabbing
$config['end_tag']   = "</body>"; // where you want to stop grabbing
$config['show_tags'] = 1; // do you want the tags to be shown when you show the html? 1 = yes, 0 = no

class grabber
{
var $error = '';
var $html  = '';

function grabhtml( $url, $start, $end )
{
	$file = file_get_contents( $url );

	if( $file )
	{
		if( preg_match_all( "#$start(.*?)$end#s", $file, $match ) )
		{				
			$this->html = $match;
		}
		else
		{
			$this->error = "Tags cannot be found.";
		}
	}
	else
	{
		$this->error = "Site cannot be found!";
	}
}

function strip( $html, $show, $start, $end )
{
	if( !$show )
	{
		$html = str_replace( $start, "", $html );
		$html = str_replace( $end, "", $html );

		return $html;
	}
	else
	{
		return $html;
	}
}
}

$grab = new grabber;
$grab->grabhtml( $config['url'], $config['start_tag'], $config['end_tag'] );

echo $grab->error;

foreach( $grab->html[0] as $html )
{
$string1 = stristr( $grab->strip( $html, $config['show_tags'], $config['start_tag'], $config['end_tag'] ),'<span 

class="body">' ) . "<br>";
}



$string2 = (explode('span',$string1));
$string2 = $string2[1];
echo $string2;

?>

 

but it's not very clean, but it works, so horray for me.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.