Jump to content

URGENT help needed scraping a page


ultimatum

Recommended Posts

Hey guys,

 

I need help scraping this page: http://director.flyerservices.com/SOB/default.aspx?banner=Sobeys&pubtype=1&language=en&view=Text&storeNumber=743

 

It's a link to accessible Sobey's store flyer. The problem is this link makes a post request to another page and then redirects so I don't know how to capture the last page that I need.

 

Can someone help me with this? I used cURL function and all I get is "Object moved here" response from the server.

 

 

Thank you guys in advance.

Link to comment
https://forums.phpfreaks.com/topic/264520-urgent-help-needed-scraping-a-page/
Share on other sites

// get flyer page
function get_page($url) 
{ 
$curl = curl_init(); 

// setup headers - used the same headers from Firefox version 2.0.0.6
// below was split up because php.net said the line was too long. :/
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; 
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; 
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive"; 
$header[] = "Keep-Alive: 300"; 
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; 
$header[] = "Accept-Language: en-us,en;q=0.5"; 
$header[] = "Pragma: "; //browsers keep this blank. 

curl_setopt($curl, CURLOPT_URL, $url); 
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3'); 
curl_setopt($curl, CURLOPT_HTTPHEADER, $header); 
curl_setopt($curl, CURLOPT_REFERER, 'http://www.google.com'); 
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate'); 
curl_setopt($curl, CURLOPT_AUTOREFERER, true); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($curl, CURLOPT_TIMEOUT, 10); 
curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookies.txt');

$html = curl_exec($curl); //execute the curl command 
if (!$html) 
{
	echo "cURL error number:" .curl_errno($curl);
	echo "cURL error:" . curl_error($curl);
	exit;
}
  
curl_close($curl); //close the connection 

return $html; //and finally, return $html 
}

 

Ok. This is the code I use to scrape the page. I tried inserting the cookie but it seems like it has no help at all.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.