ultimatum Posted June 20, 2012 Share Posted June 20, 2012 Hey guys, I need help scraping this page: http://director.flyerservices.com/SOB/default.aspx?banner=Sobeys&pubtype=1&language=en&view=Text&storeNumber=743 It's a link to accessible Sobey's store flyer. The problem is this link makes a post request to another page and then redirects so I don't know how to capture the last page that I need. Can someone help me with this? I used cURL function and all I get is "Object moved here" response from the server. Thank you guys in advance. Quote Link to comment https://forums.phpfreaks.com/topic/264520-urgent-help-needed-scraping-a-page/ Share on other sites More sharing options...
Maq Posted June 20, 2012 Share Posted June 20, 2012 Show us your code and don't put urgent in your thread title, it will likely decrease replies. Quote Link to comment https://forums.phpfreaks.com/topic/264520-urgent-help-needed-scraping-a-page/#findComment-1355595 Share on other sites More sharing options...
ultimatum Posted June 20, 2012 Author Share Posted June 20, 2012 // get flyer page function get_page($url) { $curl = curl_init(); // setup headers - used the same headers from Firefox version 2.0.0.6 // below was split up because php.net said the line was too long. :/ $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; $header[] = "Cache-Control: max-age=0"; $header[] = "Connection: keep-alive"; $header[] = "Keep-Alive: 300"; $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $header[] = "Accept-Language: en-us,en;q=0.5"; $header[] = "Pragma: "; //browsers keep this blank. curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3'); curl_setopt($curl, CURLOPT_HTTPHEADER, $header); curl_setopt($curl, CURLOPT_REFERER, 'http://www.google.com'); curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate'); curl_setopt($curl, CURLOPT_AUTOREFERER, true); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_TIMEOUT, 10); curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookies.txt'); $html = curl_exec($curl); //execute the curl command if (!$html) { echo "cURL error number:" .curl_errno($curl); echo "cURL error:" . curl_error($curl); exit; } curl_close($curl); //close the connection return $html; //and finally, return $html } Ok. This is the code I use to scrape the page. I tried inserting the cookie but it seems like it has no help at all. Quote Link to comment https://forums.phpfreaks.com/topic/264520-urgent-help-needed-scraping-a-page/#findComment-1355596 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.