How to replicate a POST with cURL

CrimpJiggler · September 20, 2013

Okay so heres an example of a page I want to download with cURL:

http://myip.ms/browse/comp_browseragents/Computer_Browser_Agents.html

as you can see there are over 15000 pages of these user_agent entries, and there are no URL variables. So I used tamper_data to get the post data, and here it is:

14:08:58.925[559ms][total 559ms] Status: 200[OK]
POST http://myip.ms/ajax_table/comp_browseragents/3/ Load Flags[LOAD_BYPASS_CACHE  LOAD_BACKGROUND  ] Content Size[3904] Mime Type[text/html]
   Request Headers:
      Host[myip.ms]
      User-Agent[Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0]
      Accept[text/html, */*; q=0.01]
      Accept-Language[en-US,en;q=0.5]
      Accept-Encoding[gzip, deflate]
      Content-Type[application/x-www-form-urlencoded; charset=UTF-8]
      X-Requested-With[XMLHttpRequest]
      Referer[http://myip.ms/browse/comp_browseragents/Computer_Browser_Agents.html]
      Content-Length[19]
      Cookie[s2_csrf_cookie_name=23f5312826ba7a316f70bcf2555c1e94; s2_csrf_cookie_name=23f5312826ba7a316f70bcf2555c1e94; sw=141.6; sh=65.2; __utma=126509969.298336339.1379680552.1379680552.1379682469.2; __utmc=126509969; __utmz=126509969.1379682469.2.2.utmcsr=localhost|utmccn=(referral)|utmcmd=referral|utmcct=/dummy_page/test.php; __utmb=126509969.2.10.1379682469]
      DNT[1]
      Connection[keep-alive]
      Pragma[no-cache]
      Cache-Control[no-cache]
   Post Data:
      getpage[yes]
      lang[en]
   Response Headers:
      Server[nginx]
      Date[Fri, 20 Sep 2013 13:08:58 GMT]
      Content-Type[text/html; charset=utf-8]
      Content-Length[3904]
      Connection[keep-alive]
      Content-Encoding[gzip]
      Vary[Accept-Encoding]
      X-Powered-By[PleskLin]

So the only thing in there which identifies the page number, is this: http://myip.ms/ajax_table/comp_browseragents/3/

I'm guessing I need to replicate that ajax POST, so heres what I tried:

$ch = curl_init('http://myip.ms/ajax_table/comp_browseragents/3/');

curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13');
curl_setopt($ch, CURLOPT_REFERER, 'http://myip.ms/browse/comp_browseragents/Computer_Browser_Agents.html');

$data = array(
'getpage' => 'yes',
'lang' => 'en'
);

curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$curl_scraped_page = curl_exec($ch);


curl_close($ch);

echo $curl_scraped_page;

but when I ran the script, heres the response I got:

Invalid Webpage URL
Go Home

I am trying to learn how to use cURL to scrape from sites effectively, but this is a problem I keep running into, I don't know how to do whatever it is the website is doing to get the data.

kicken · September 20, 2013

You need to add the X-Requested-With: XMLHttpRequest header. It is checking for that to validate that it is an ajax request. See CURLOPT_HTTPHEADER

CrimpJiggler · September 21, 2013

That worked, thanks a lot. For some reason the $curl_scraped_page variable only contained the data outputted by AJAX, rather than the full web page. This is exactly what I needed but I'm trying to figure out how it works since the script still included the same commands I'd use to scrape the whole page.

CrimpJiggler · September 21, 2013

Ah wait sorry, I see it was only the AJAX URL I was loading, not the main page.

Sign In

How to replicate a POST with cURL

Recommended Posts

CrimpJiggler

Link to comment

Share on other sites

kicken

Link to comment

Share on other sites

CrimpJiggler

Link to comment

Share on other sites

CrimpJiggler

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information