Jump to content

Problem w/ cURL.... or networking... IDK which


nrobi

Recommended Posts

I originally wrote code that functions somewhat like the cURL library before I knew it existed. I did a scriptlance project where I was suppose to scrape contact information of network marketers from their company-supplied webpages. Something like www.domain.com/usr0001. The script then advanced to usr0002, etc.

 

When I executed my code the contact information conveniently was not present in the http content that was sent back. Using the same code I believe I'm having the same problem on a different site/project now. So I attempted to use cURL and have the same results. (The original problem was GET requests... this is POST.. I think. All I know is it's not working :(.)

 

I'm at a real loss here. I'm not sure why this is happening. In the current case I want to submit form data but for some reason several hidden variables are not present.... ALL the time. In the web browser they are there of course but not in my script.

 

Any advice is much appreciated. The code is below:

//MAIN PROGRAM
$content = '';
$referer = '';
$curl = '';
$forms = '';

require('e:\php5\work\forms.php');

$hdr = array('Accept' => "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
			'Accept-Language' => "en-us,en;q=0.5",
			'Accept-Charset' => "ISO-8859-1,utf-8;q=0.7,*;q=0.7",
			'Keep-Alive' => "300",
			'Connection' => "keep-alive"
		);

$copt = array(CURLOPT_COOKIESESSION => true,
			CURLOPT_RETURNTRANSFER =>true,
			CURLOPT_AUTOREFERER => true,
			CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
			CURLOPT_FOLLOWLOCATION => true,
			CURLOPT_MAXREDIRS => 5,
			CURLOPT_ENCODING => "gzip,deflate",
			CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12",
			CURLOPT_HTTPHEADER => $hdr
		);


$curl = curl_init();

curl_setopt($curl,CURLOPT_URL,URL_DEST);
curl_setopt_array($curl,$copt);
$content = curl_exec($curl);

curl_setopt($curl,CURLOPT_REFERER,$referer = curl_getinfo($curl,CURLINFO_EFFECTIVE_URL));

$forms = extract_forms($content);
$forms = $forms[0];
send_form(0);

echo "\n\n$content";
curl_close($curl);

 

The function extract_forms() takes each form on the page and places them into an array. Only form related tags are extracted everything else is discarded.

 

The function send_form() is a bit long. It grabs the ACTION and METHOD attributes, determines the URL to send the data to, gathers up the data into name=value pairs, makes sure it's url-safe and sends it off.

 

The end of the function was rewritten to:

	if(eregi('get',$method))
	$action .= '?'.$postdata;
else {
	curl_setopt($curl,CURLOPT_POST,true);
	curl_setopt($curl,CURLOPT_POSTFIELDS,$postdata);
}

curl_setopt($curl,CURLOPT_URL,$action);
curl_setopt($curl,CURLOPT_HTTPGET,true);
$content = curl_exec($curl);

 

These functions work ok. I've been using them for different projects for years. And their behavior works perfectly off the saved page source from the browser. I just don't know why what my browser is getting and what my script is getting is different.

 

Thanks for your help

Link to comment
https://forums.phpfreaks.com/topic/175271-problem-w-curl-or-networking-idk-which/
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.