Jump to content

Page scraping need help.


jamesxg1

Recommended Posts

Hiya peeps,

 

Ok this is long, so here goes.

 

<?php

require("taggrab.class.php");

$urlrun = "http://www.amazon.co.uk/s/ref=amb_link_43904365_1?ie=UTF8&node=91&pf_rd_m=A3P5ROKL5A1OLE&pf_rd_s=left-1&pf_rd_r=0YH9RVM0ZQYMNAH16BN2&pf_rd_t=101&pf_rd_p=468791753&pf_rd_i=91";


$stag = '<li><a href="';
$etag="</a></li>";

$tspider = new tagSpider();

$tspider->fetchPage($urlrun);

$linkarray = $tspider->parse_array($stag, $etag); 

echo "<h2>Links present on page: ".$urlrun."</h2><br />";

foreach ($linkarray as $result) {

echo $result;

echo "<br/>";

}


?>

 

<?php

class tagSpider
{

var $crl; // this will hold our curl instance
var $html; // this is where we dump the html we get
var $binary; // set for binary type transfer
var $url; // this is the url we are going to do a pass on



function tagSpider()
{
$this->html = "";
$this->binary = 0;
$this->url = "";
}


function fetchPage($url)
{


$this->url = $url;
if (isset($this->url)) {

			$this->ch = curl_init (); // start cURL instance
			curl_setopt ($this->ch, CURLOPT_RETURNTRANSFER, 1); // this tells cUrl to return the data
			curl_setopt ($this->ch, CURLOPT_URL, $this->url); // set the url to download
			curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
			curl_setopt($this->ch, CURLOPT_BINARYTRANSFER, $this->binary); // tell cURL if the data is binary data or not
			$this->html = curl_exec($this->ch); // grabs the webpage from the internet
			curl_close ($this->ch); // closes the connection
			}
}


function parse_array($beg_tag, $close_tag) // this function takes the grabbed html and picked out the pieces we want
{
preg_match_all("($beg_tag.*$close_tag)siU", $this->html, $matching_data); // match data between specificed tags
return $matching_data[0];
}


}
?>

 

Ok theres the script, i need it so that i can get the following (i will mark what i need info i need to get with * * )

 

<div class="productTitle">
<a href="A URL WILL BE HERE"> **** The Digital Photography Book: The Step-by-step Secrets for How to Make Your Photos Look Like the Pros'! ****</a>
<span class="ptBrand">**** by Scott Kelby **** </span>

<div class="newPrice">
<a href="A URL WILL BE HERE">Buy new</a>
: 
<strike>£13.99</strike>
<span>**** £6.98 ****</span>
</div>

 

How do i do this ?,

 

Cheers bud's.

 

James.

Link to comment
https://forums.phpfreaks.com/topic/179270-page-scraping-need-help/
Share on other sites

you're going to have to use preg_match() i'd think.

 

match the <div class="productTitle"> and grab everything with the <div>...</div> tags and slap it in an array?

 

i've never done the screen-scraping before.

 

Amazon does have a Web Service/API available.  could only imagine it'd be fully documented.  not sure if it'd be right for what you're trying to do though.

 

sorry i couldn't be of any more help.

you're going to have to use preg_match() i'd think.

 

match the <div class="productTitle"> and grab everything with the <div>...</div> tags and slap it in an array?

 

i've never done the screen-scraping before.

 

Amazon does have a Web Service/API available.  could only imagine it'd be fully documented.  not sure if it'd be right for what you're trying to do though.

 

sorry i couldn't be of any more help.

 

Hiya,

 

Thanks for the reply, Ah i see, i dont know how to use that function, and the links will be diffrent for every listing so i cant see it working, preg_match_all() might work but i have no idea how to use it but i know that .*? is what i would need to use.

 

And they do but you have to implament it into a site i do beleive

 

Many thanks,

 

James.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.