Jump to content

Parse rss feed and update based on changes


terrid

Recommended Posts

Hi all

 

I am currently in the process of scoping out a script and I am needing some help.

 

I have a mysql table, that has several rows, each with the following:

 

title | description | price | rss_url | delivery_cost | retailer_message | discount | total

---------------------------------------------------------------------------------------------------------------

Halo: Reach (XBOX 360) | Description text | 35.99 | http://www.easycontentunits.com/rss/54626/669/rss2.rss | On sale at Gameplay | 0 | 35.99

----------------------------------------------------------------------------------------------------------------------------------------------

Halo: Reach | Description text | 36.89 | http://www.easycontentunits.com/rss/54626/669/rss2.rss | On sale at Toys R Us | 0 | 36.89

 

What I need to do is the following:

 

  • Select each row and extract the rss_url
  • Parse the rss_url and extract the above details from the rss feed
  • If the details in the rss feed has changed, update the row
  • If the actual rss feed item is no longer in the rss feed, delete it from the table.

 

So If the rss feed changes the first rows details in my db has the price field updated to 40.00, I need to update the row and the price field will change from 35.99 to say 40.00

 

If the row is no longer in the rss feed, then I need to delete it from the db.

 

Can anyone provide me a small snippet or any advice on how to go about doing this?

 

I can provide the code I'm using to actually insert the rows from the RSS feed if it helps?

 

Thanks

This is what I have at the moment:

 

$connection = mysql_connect("localhost","root","password");
if (!$connection)
{
  die('Could not connect: ' . mysql_error());
}

  mysql_select_db("at_any_price", $connection);

  $result = mysql_query("SELECT * from easy_contents");
  while($row = mysql_fetch_array($result))
  {

   $articles = array();
   $easy_url = $row['rss_url'];

   $rawFeed = file_get_contents($easy_url);
   $xml = new SimpleXmlElement($rawFeed);


   $channel = array();
   $channel['title']       = $xml->channel->title;
   $channel['link']        = $xml->channel->link;
   $channel['description'] = $xml->channel->description;


foreach ($xml->channel->item as $item)
{
	$article = array();
	$article['title'] = $item->title;
	$article['link'] = $item->link;
	$article['description'] = (string) trim($item->description);

	//strip out all the HTML tags
	$item->description = str_replace('<table><tr><td width="110">','', $item->description);
	$item->description = str_replace('</table>','', $item->description);
	$item->description = str_replace('</td>','', $item->description);
	$item->description = str_replace('<td>','', $item->description);
	$item->description = str_replace('<br />','', $item->description);
	$item->description = str_replace('<b>','', $item->description);
	$item->description = str_replace('</b>','', $item->description);
	$item->description = str_replace('</tr>','', $item->description);

	//find all url encoded £ signs and find the string after
	//string will be a price
	preg_match_all('/&#xA3;([0-9.]+)/', $item->description, $results);
	foreach ($results as $k => $v) {
	}

	//find the url encoded £ sign and append the price
	$all = '&#xA3;'.$v[0];
	$price_stripped = str_replace($all, '', $item->description);
	$desc = preg_match('/&#xA3;([0-9.]+)/', $item->description);

	//find the discount deleviry cost from the rss using the ~#£NUMBER
	//this is the discount
	preg_match_all('/~#£([0-9.]+)/', $item->description, $discount);
	foreach ($discount as $d => $disc) {
		str_replace("~#£","", $disc[0]);
	}

	//find the remaining £PRICE and this is the delivery cost
	//this is the delivery_cost
	preg_match_all('/£([0-9.]+)/', $item->description, $delivery_cost);
	foreach ($delivery_cost as $del => $deliv) {	
	}

   	//find the | char and find the string after it
	//this is the retailer_message
	preg_match_all('/\|(.*?)\./',$item->description,$match);           
	foreach ($match as $rel => $retail) {	
 		$retail[0] = str_replace("| ","", $retail[0]);
		$retail_mess = str_replace(" On","On", $retail[0]);		
}   
  }
}

 

It selects the rows from the db, selects the rss_url and then outputs the data.

I then need to update based on the output.

 

All help is welcomed

Right after a couple of tweaks I have the code updating rows in my database, the problem is, it seems to update all the rows with the last item in the RSS Feed and not the actual details of each individual item.

 

$connection = mysql_connect("localhost","root","password");
if (!$connection)
{
  die('Could not connect: ' . mysql_error());
}

  mysql_select_db("at_any_price", $connection);

  $result = mysql_query("SELECT * from easy_contents");
  while($row = mysql_fetch_array($result))
  {

   $articles = array();
   $easy_url = $row['rss_url'];

   $rawFeed = file_get_contents($easy_url);
   $xml = new SimpleXmlElement($rawFeed);


   $channel = array();
   $channel['title']       = $xml->channel->title;
   $channel['link']        = $xml->channel->link;
   $channel['description'] = $xml->channel->description;


foreach ($xml->channel->item as $item)
{
	$article = array();
	$article['title'] = $item->title;
	$article['link'] = $item->link;
	$article['description'] = (string) trim($item->description);

	//strip out all the HTML tags
	$item->description = str_replace('<table><tr><td width="110">','', $item->description);
	$item->description = str_replace('</table>','', $item->description);
	$item->description = str_replace('</td>','', $item->description);
	$item->description = str_replace('<td>','', $item->description);
	$item->description = str_replace('<br />','', $item->description);
	$item->description = str_replace('<b>','', $item->description);
	$item->description = str_replace('</b>','', $item->description);
	$item->description = str_replace('</tr>','', $item->description);

	//find all url encoded £ signs and find the string after
	//string will be a price
	preg_match_all('/&#xA3;([0-9.]+)/', $item->description, $results);
	foreach ($results as $k => $v) {
	}

	//find the url encoded £ sign and append the price
	$all = '&#xA3;'.$v[0];
	$price_stripped = str_replace($all, '', $item->description);
	$desc = preg_match('/&#xA3;([0-9.]+)/', $item->description);

	//find the discount deleviry cost from the rss using the ~#£NUMBER
	//this is the discount
	preg_match_all('/~#£([0-9.]+)/', $item->description, $discount);
	foreach ($discount as $d => $disc) {
		str_replace("~#£","", $disc[0]);
	}

	//find the remaining £PRICE and this is the delivery cost
	//this is the delivery_cost
	preg_match_all('/£([0-9.]+)/', $item->description, $delivery_cost);
	foreach ($delivery_cost as $del => $deliv) {	
	}

   	//find the | char and find the string after it
	//this is the retailer_message
	preg_match_all('/\|(.*?)\./',$item->description,$match);           
	foreach ($match as $rel => $retail) {	
 		$retail[0] = str_replace("| ","", $retail[0]);
		$retail_mess = str_replace(" On","On", $retail[0]);		
}   
		//echo $retail[0] . $deliv[0] . $row['rss_url'] . "<br />";
		$total = $v[0] + $deliv[0] - $disc[0];
		mysql_query("UPDATE easy_contents SET delivery_cost = '$deliv[0]', price = '$v[0]', total = '$total' WHERE rss_url = '$row[rss_url]'  ");

  }		
}

Add some debugging code

 

	mysql_query("UPDATE easy_contents SET delivery_cost = '$deliv[0]', price = '$v[0]', total = '$total' WHERE rss_url = '$row[rss_url]'  ");

Becomes

 

$sql = "UPDATE easy_contents SET delivery_cost = '$deliv[0]', price = '$v[0]', total = '$total' WHERE rss_url = '$row[rss_url]'";
if(!$query = mysql_query($sql)) {
    echo "Error on line ".__LINE__.". ".mysql_error().".<br />\nQuery: ".$sql;
    exit;
}
echo "Query OK. <br />\nUpdated rows: ".mysql_affected_rows().".<br />\nQuery: ".$sql;

What's the output?

$sql = "UPDATE easy_contents SET delivery_cost = '$deliv[0]', price = '$v[0]', total = '$total' WHERE rss_url = '$row[rss_url]' AND title = '$item->title' AND description = '$price_stripped' ";
		if(!$query = mysql_query($sql)) {
		    echo "Error on line ".__LINE__.". ".mysql_error().".<br />\nQuery: ".$sql;
		    exit;
		}
		echo "Query OK. <br />\nUpdated rows: ".mysql_affected_rows().".<br />\nQuery: ".$sql

 

Query OK.

Updated rows: 1.

 

The following seemed to work.

 

How would I do the next part of my script?

If the item in the RSS has been removed, then delete the row from the db?

Well, you'll need to store each item's GUID for a start. It's the only unique identifier you've got.

 

I'm sure there's a more elegant solution, but this is all that came to mind:

 

During your loops, build two arrays. In one, store all of the GUIDs live from the feed. In the other, store all the GUIDs from the database rows. Compare the arrays at the end of your script to work out which row GUIDs have no corresponding feed item GUIDs, and delete accordingly.

Could you provide any code?

 

I have in the table an id row for each item that gets inserted into the table from the RSS feed.

The RSS feed does not have an ID. What I was checking on is the retailer_message (On sale at Amazon) for example. This is the same in the database as it is in RSS feed, so this was what I was using to check if things were the same, i.e. if the price had changed.

 

What I was thinking was:

 

Count how many rows I have with the rss_url in the db, if it's 5, store in a variable.

Parse the XML using the same rss_url, if the count is 5 and matches the above variable, do nothing

If it is 6, delete all the rows with the rss_url and then re-insert the parsed rss feed, with the 6 items.

 

This sounds a bit drastic though

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.