Page scraping need help.

jamesxg1 · October 27, 2009

Hiya peeps,

Ok this is long, so here goes.

<?php

require("taggrab.class.php");

$urlrun = "http://www.amazon.co.uk/s/ref=amb_link_43904365_1?ie=UTF8&node=91&pf_rd_m=A3P5ROKL5A1OLE&pf_rd_s=left-1&pf_rd_r=0YH9RVM0ZQYMNAH16BN2&pf_rd_t=101&pf_rd_p=468791753&pf_rd_i=91";


$stag = '<li><a href="';
$etag="</a></li>";

$tspider = new tagSpider();

$tspider->fetchPage($urlrun);

$linkarray = $tspider->parse_array($stag, $etag); 

echo "<h2>Links present on page: ".$urlrun."</h2><br />";

foreach ($linkarray as $result) {

echo $result;

echo "<br/>";

}


?>

<?php

class tagSpider
{

var $crl; // this will hold our curl instance
var $html; // this is where we dump the html we get
var $binary; // set for binary type transfer
var $url; // this is the url we are going to do a pass on



function tagSpider()
{
$this->html = "";
$this->binary = 0;
$this->url = "";
}


function fetchPage($url)
{


$this->url = $url;
if (isset($this->url)) {

			$this->ch = curl_init (); // start cURL instance
			curl_setopt ($this->ch, CURLOPT_RETURNTRANSFER, 1); // this tells cUrl to return the data
			curl_setopt ($this->ch, CURLOPT_URL, $this->url); // set the url to download
			curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
			curl_setopt($this->ch, CURLOPT_BINARYTRANSFER, $this->binary); // tell cURL if the data is binary data or not
			$this->html = curl_exec($this->ch); // grabs the webpage from the internet
			curl_close ($this->ch); // closes the connection
			}
}


function parse_array($beg_tag, $close_tag) // this function takes the grabbed html and picked out the pieces we want
{
preg_match_all("($beg_tag.*$close_tag)siU", $this->html, $matching_data); // match data between specificed tags
return $matching_data[0];
}


}
?>

Ok theres the script, i need it so that i can get the following (i will mark what i need info i need to get with * * )

<div class="productTitle">
<a href="A URL WILL BE HERE"> **** The Digital Photography Book: The Step-by-step Secrets for How to Make Your Photos Look Like the Pros'! ****</a>
<span class="ptBrand">**** by Scott Kelby **** </span>

<div class="newPrice">
<a href="A URL WILL BE HERE">Buy new</a>
: 
<strike>£13.99</strike>
<span>**** £6.98 ****</span>
</div>

How do i do this ?,

Cheers bud's.

James.

jamesxg1 · October 27, 2009

Anyone ?, I need this peeps, please

Many thanks,

James.

mrMarcus · October 27, 2009

you're going to have to use preg_match() i'd think.

match the <div class="productTitle"> and grab everything with the <div>...</div> tags and slap it in an array?

i've never done the screen-scraping before.

Amazon does have a Web Service/API available. could only imagine it'd be fully documented. not sure if it'd be right for what you're trying to do though.

sorry i couldn't be of any more help.

jamesxg1 · October 27, 2009

you're going to have to use preg_match() i'd think.

match the <div class="productTitle"> and grab everything with the <div>...</div> tags and slap it in an array?

i've never done the screen-scraping before.

Amazon does have a Web Service/API available. could only imagine it'd be fully documented. not sure if it'd be right for what you're trying to do though.

sorry i couldn't be of any more help.

Hiya,

Thanks for the reply, Ah i see, i dont know how to use that function, and the links will be diffrent for every listing so i cant see it working, preg_match_all() might work but i have no idea how to use it but i know that .*? is what i would need to use.

And they do but you have to implament it into a site i do beleive

Many thanks,

James.

jamesxg1 · October 27, 2009

Anyone ?

Sign In

Page scraping need help.

Recommended Posts

jamesxg1

Link to comment

Share on other sites

jamesxg1

Link to comment

Share on other sites

mrMarcus

Link to comment

Share on other sites

jamesxg1

Link to comment

Share on other sites

jamesxg1

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information