Jump to content

PHP Curl Issue


PeerFly

Recommended Posts

Ok, here's what I am trying to do: I need to log into another website, and download their reports. Seems easy right? Well, it is... because I've done it with numerous other websites. However, this one is tricky. The only way I can download the report at this website is by having the correct "key" at the end of the URL. The key is generated by their server and carries over with each link that you click on their site once logged in. Sort of like a session ID, but it's not a session ID at all.

 

Anyway, this key is unique to the current cookie session for that user. If they were to log out, none of those links with that unique trailing "key" would work. They would have to log back in and use the new links.

 

The trailing key is nothing but 10 numbers long.

 

So, I've devised a small curl script that does everything... it logs into the site, grabs that trailing "key" from a hidden text field, and it downloads the CSV report. However, when I look at the downloaded report, it shows a "This page does not exist" custom apache page from their server.

 

The problem is that it is either losing the cookie when it goes to download the report URL, or it logs out/back in when it goes to download the report URL.

 

Here's the code (I've removed the URL's):

 

$graburl = **URL OF CSV REPORT**;
$cookiefile = tempnam("/tmp", "cookies"); 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $loginurl);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiefile);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiefile);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "$usernamefield=$username&$passwordfield=$password");

ob_start();
$output = curl_exec($ch);
ob_end_clean();



**ADD CODE**
$link = '**URL OF REPORT PAGE TO FETCH THE KEY**';
curl_setopt($ch, CURLOPT_URL, $link);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$page = curl_exec($ch);

$sk = '%<input type="hidden" id="window_name" value="(.*)">%';
preg_match_all($sk, $page, $results, PREG_PATTERN_ORDER);
**ADD CODE**



if ($output == '1') {
curl_setopt($ch, CURLOPT_URL, "$graburl&session_key=".$results[1][0]);
$putcsv = "report.csv";
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiefile);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiefile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
file_put_contents("**MY URL**".$putcsv."", "$result");
}
curl_close($ch);

 

Note that this code works beautifully on other sites where I get reports from as well. It's just the extra code marked in "**ADD CODE**" and the "&session_key=".$results[1][0]" added to the graburl that is giving me problems.

 

It may have something to do with grabbing to URL's in the same curl session, I can't figure it out.

Link to comment
Share on other sites

No, this isn't on the same server, I'm curling into another server.

 

No redirects involved.

 

If I make it print the $results[1][0] bit and exit it before the rest of the script executes, I can tell that it captures the key as it shows a new 10 digit unique code with each page refresh. So, it's grabbing the key just fine, but when the script goes to call that second URL within the same session, it either loses the cookie or gets a new one. It seems as if I need a method to call multiple URL's using the same curl session (and same cookies).

Link to comment
Share on other sites

Instead of using cURL cookie file, read the header from the curl login request, grab the Set-Cookie line from the header and attach that to your cURL download request. The server is most likely setting a browser ONLY based cookie, so you need to grab it from the header because it's not going to be placed in the cookie file!

Link to comment
Share on other sites

Ok, this is what it's come down to...

 

I am able to browse to multiple pages on their server using the same cookie, no problem. I am able to grab that key from a hidden text field and append it to the url of the csv report, no problem. However, when I parse the results and put the contents into a file, it always brings back the same error you would get if you tried accessing the page with a different cookie or a wrong trailing 10 digit key. However, it's all correct!

 

If I do this manually on their site, it works fine so I know it isn't anything to do with them.

 

I'm absolutely stuck.

Link to comment
Share on other sites

Run this script, changing the setting at the top and then zip up the * out.txt * file and PM me a link to download it. After I will post you some code to handle the fetching of your CSV file...

 

 


	$path = 'http://www.site.com/login.asp';

	$post = $usernamefield . '=' . $username . '&' . $passwordfield . '=' . $password;

	$io = curl_init ();

	curl_setopt ( $io, CURLOPT_URL, $path );

	curl_setopt ( $io, CURLOPT_TIMEOUT, 4 );

	curl_setopt ( $io, CURLOPT_ENCODING, '' );

	curl_setopt ( $io, CURLOPT_MAXREDIRS, 3 );

	curl_setopt ( $io, CURLOPT_FOLLOWLOCATION, true );

	curl_setopt ( $io, CURLOPT_RETURNTRANSFER, true );

	curl_setopt ( $io, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)' );

	curl_setopt ( $io, CURLOPT_HEADER, true );

	curl_setopt ( $io, CURLOPT_CUSTOMREQUEST, 'POST' );

	curl_setopt ( $io, CURLOPT_POSTFIELDS, $post );

	file_put_contents ( './out.txt', trim ( curl_exec ( $io ) ) );

	curl_close ( $io );

Link to comment
Share on other sites

I wish you would have given me all the correct urls, but I hopefully you can figure it out.

 


<?php

function curlExecute ( $options )
{
$io = curl_init ();

curl_setopt ( $io, CURLOPT_HEADER, true );

curl_setopt ( $io, CURLOPT_HTTPHEADER, array ( 'Expect:') );

curl_setopt ( $io, CURLOPT_TIMEOUT, 4 );

curl_setopt ( $io, CURLOPT_ENCODING, '' );

curl_setopt ( $io, CURLOPT_MAXREDIRS, 5 );

curl_setopt ( $io, CURLOPT_FOLLOWLOCATION, true );

curl_setopt ( $io, CURLOPT_RETURNTRANSFER, true );

curl_setopt ( $io, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)' );

if ( $options['request_type'] == 'post' )
{
	curl_setopt ( $io, CURLOPT_POST, true );

	curl_setopt ( $io, CURLOPT_POSTFIELDS, $options['post_fields'] );

	curl_setopt ( $io, CURLOPT_URL, $options['post_url'] );
}
else
{
	if ( ! empty ( $options['get_fields'] ) )
	{
		$options['get_url'] .= '?';

		foreach ( $options['get_fields'] AS $k => $v )
		{
			$options['get_url'] .= $k . '=' . urlencode ( $v ) . '&';
		}
	}

	curl_setopt ( $io, CURLOPT_URL, substr ( $options['get_url'], 0, -1 ) );
}

if ( true === $options['use_ssl'] )
{
	curl_setopt ( $io, CURLOPT_SSL_VERIFYHOST, false );

	curl_setopt ( $io, CURLOPT_SSL_VERIFYPEER, false );
}

if ( true === $options['use_cookie'] )
{
	if ( ! file_exists ( $options['cookie_file'] ) )
	{
		file_put_contents ( $options['cookie_file'], '' );
	}

	curl_setopt ( $io, CURLOPT_COOKIEJAR, $options['cookie_file'] );

	curl_setopt ( $io, CURLOPT_COOKIEFILE, $options['cookie_file'] );
}

$out = curl_exec ( $io );

curl_close ( $io );

        return explode ( "\r\n\r\n", $out, 2 );
}

/* 1... login page */

$options = array ( 
'request_type' => 'post', 
'use_ssl'  => true,
'use_cookie' => true, 
'cookie_file' => './cookie.txt', 
'post_fields' => array ( 'login_type' => '', 'login_name' => 'data', 'login_password' => 'data' ), 
'get_fields'  => array (), 
'post_url'    => 'https://login.azoogleads.com/affiliate/login/process_login', 
'get_url'    => '' /* don't inculde the '?' */
);

list ( $header, $body ) = curlExecute ( $options );

/* 2... grab key page */

$options = array ( 
'request_type' => 'get', 
'use_ssl'  => true,
'use_cookie' => true, 
'cookie_file' => './cookie.txt', 
'post_fields' => array (), 
'get_fields'  => array (), 
'post_url'    => '', 
'get_url'    => 'http://site.com/affiliate_files.php' /* don't inculde the '?' */
);

list ( $header, $body ) = curlExecute ( $options );

/* do your regex to get the key in the hidden field */

/* 3... download the file */

$options = array ( 
'request_type' => 'get', 
'use_ssl'  => true,
'use_cookie' => true, 
'cookie_file' => './cookie.txt', 
'post_fields' => array (), 
'get_fields'  => array ('session_key' => 'session_key_value' ), 
'post_url'    => '', 
'get_url'    => 'http://site.com/downloads.php' /* don't inculde the '?' */
);

list ( $header, $body ) = curlExecute ( $options );

/* echo $body; */

?>

Link to comment
Share on other sites

Ok, this is definitely progress. I've added the code and echoed all 3 of the header/bodies.

 

The first print out is:

 

HTTP/1.1 302 Found Date: Thu, 12 Feb 2009 10:53:30 GMT Server: Apache/2.0.59 X-Powered-By: PHP/5.1.5 Set-Cookie: affiliate_login_credentials=deleted; expires=Wed, 13-Feb-2008 10:53:30 GMT; path=/; domain=azoogleads.com Set-Cookie: affiliate_login_credentials=1018ca0891839f257f1b5ab3246a986d56974; path=/; domain=azoogleads.com; secure Location: /affiliate/home/welcome_page Content-Length: 0 Content-Type: text/html

 

The second print out (the key page), shows the following header as well as the body:

 

HTTP/1.1 200 OK Date: Thu, 12 Feb 2009 11:03:38 GMT Server: Apache/2.0.59 X-Powered-By: PHP/5.1.5 Transfer-Encoding: chunked Content-Type: text/html

 

The third print out (which should be the report) shows nothing.

 

If I place the third $body into a file_put_contents for the csv report, the file is empty. So it seems to be giving the same result...

Link to comment
Share on other sites

If I instruct the script to create a new cookie with each grab (cookie1, cookie2, cookie3) it creates 2 cookies that contain totally different information. This must be the problem. How can I make the second cookie append to the first cookie file instead of overwriting it with new information?

 

Here is the first cookie:

 

.azoogleads.com TRUE / TRUE 0 affiliate_login_credentials 3c94f2944ffba8be017af32715fe13f383531

 

Here is the second cookie:

 

login.azoogleads.com FALSE / FALSE 1234437466 requested_controller affiliatestats

login.azoogleads.com FALSE / FALSE 1234437466 requested_action SubReport

 

Note that if I do it with just one cookie file, only the first cookie data is present on the file (cookie1). Nothing is added to the file, it's just overwritten.

Link to comment
Share on other sites

I wish these forums gave more time to modify recent posts. Anyway, I changed the file_put_contents code for the cookie file to :

 

if ( ! file_exists ( $options['cookie_file'] ) ) {
file_put_contents ( $options['cookie_file'], '' );
}
else {
file_put_contents ( $options['cookie_file'], '', FILE_APPEND );
}

 

Which you would think by doing that it would append the other cookie to the file but it looks like it doesn't even get added at all. I don't get it.

Link to comment
Share on other sites

Disregard the above post... I didn't realize that no content was being added to the file_put_contents. Here's what I changed:

 

if ( ! file_exists ( 'cookie.txt' ) ) {
file_put_contents ( 'cookie.txt', '' );
curl_setopt ( $io, CURLOPT_COOKIEJAR, 'cookie.txt' );
curl_setopt ( $io, CURLOPT_COOKIEFILE, 'cookie.txt' );
}
else {
file_put_contents ( 'cookie2.txt', '' );
curl_setopt ( $io, CURLOPT_COOKIEJAR, 'cookie2.txt' );
curl_setopt ( $io, CURLOPT_COOKIEFILE, 'cookie2.txt' );
$content = file_get_contents( 'cookie2.txt' ); 
file_put_contents ( 'cookie.txt', '$content', FILE_APPEND );
curl_setopt ( $io, CURLOPT_COOKIEJAR, 'cookie.txt' );
curl_setopt ( $io, CURLOPT_COOKIEFILE, 'cookie.txt' );
}

 

After all that, it's giving me the same result from your original code printf. There is no append to the cookie file and no csv report being made.

 

I've gotta hit the sack, thanks for all your help. Hopefully you or someone else can find a solution to this issue. Looks as though it is something to do with cookies for sure.

Link to comment
Share on other sites

If I could see what you are actually doing I could help you so much better. Now I am just playing the guessing game. I know I can fix it, because there no way a server can block login access if you play the browser game. I mean act like a real web browser. If you want me to fix it without having to keep guessing send me the (login info, the url to login (login type affiliate or advertiser), the url to the page you grab that hidden input key, and the url to the file download) I promise I will not look at or touch anything not related to downloading the file you want. That's all I can offer you, because it's the only way I can see exactly what is happening, and it will allow me to fix the problem without guessing. If you want to do that, send me that information to (jbricci(AT)gmail.com) and I will have a script for you that works later today.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.