Jump to content

How to replicate a POST with cURL


CrimpJiggler
Go to solution Solved by kicken,

Recommended Posts

Okay so heres an example of a page I want to download with cURL:

http://myip.ms/browse/comp_browseragents/Computer_Browser_Agents.html

as you can see there are over 15000 pages of these user_agent entries, and there are no URL variables. So I used tamper_data to get the post data, and here it is:

14:08:58.925[559ms][total 559ms] Status: 200[OK]
POST http://myip.ms/ajax_table/comp_browseragents/3/ Load Flags[LOAD_BYPASS_CACHE  LOAD_BACKGROUND  ] Content Size[3904] Mime Type[text/html]
   Request Headers:
      Host[myip.ms]
      User-Agent[Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0]
      Accept[text/html, */*; q=0.01]
      Accept-Language[en-US,en;q=0.5]
      Accept-Encoding[gzip, deflate]
      Content-Type[application/x-www-form-urlencoded; charset=UTF-8]
      X-Requested-With[XMLHttpRequest]
      Referer[http://myip.ms/browse/comp_browseragents/Computer_Browser_Agents.html]
      Content-Length[19]
      Cookie[s2_csrf_cookie_name=23f5312826ba7a316f70bcf2555c1e94; s2_csrf_cookie_name=23f5312826ba7a316f70bcf2555c1e94; sw=141.6; sh=65.2; __utma=126509969.298336339.1379680552.1379680552.1379682469.2; __utmc=126509969; __utmz=126509969.1379682469.2.2.utmcsr=localhost|utmccn=(referral)|utmcmd=referral|utmcct=/dummy_page/test.php; __utmb=126509969.2.10.1379682469]
      DNT[1]
      Connection[keep-alive]
      Pragma[no-cache]
      Cache-Control[no-cache]
   Post Data:
      getpage[yes]
      lang[en]
   Response Headers:
      Server[nginx]
      Date[Fri, 20 Sep 2013 13:08:58 GMT]
      Content-Type[text/html; charset=utf-8]
      Content-Length[3904]
      Connection[keep-alive]
      Content-Encoding[gzip]
      Vary[Accept-Encoding]
      X-Powered-By[PleskLin]

So the only thing in there which identifies the page number, is this: http://myip.ms/ajax_table/comp_browseragents/3/

 

I'm guessing I need to replicate that ajax POST, so heres what I tried:

$ch = curl_init('http://myip.ms/ajax_table/comp_browseragents/3/');

curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13');
curl_setopt($ch, CURLOPT_REFERER, 'http://myip.ms/browse/comp_browseragents/Computer_Browser_Agents.html');

$data = array(
'getpage' => 'yes',
'lang' => 'en'
);

curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$curl_scraped_page = curl_exec($ch);


curl_close($ch);

echo $curl_scraped_page;

but when I ran the script, heres the response I got:

 

Invalid Webpage URL
Go Home

 

 

I am trying to learn how to use cURL to scrape from sites effectively, but this is a problem I keep running into, I don't know how to do whatever it is the website is doing to get the data.

Link to comment
Share on other sites

That worked, thanks a lot. For some reason the $curl_scraped_page variable only contained the data outputted by AJAX, rather than the full web page. This is exactly what I needed but I'm trying to figure out how it works since the script still included the same commands I'd use to scrape the whole page.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.