Jump to content

Recommended Posts

i'm new to php and don't really know where to start here...  i'm automating a system that scrapes a site for a particular pdf download link (got this far), downloads it, parses the pdf, etc...  problem is that you must be logged in (while viewing in the browser) in order to access the pdf...  if you're not logged in, you are redirected and the download fails...  i do have a proper login...

 

how would i go about utilizing my login in order automate the pdf download?

is there a way to send the login with the url request?  or open a stream, login, and retry the download?

 

thanks

 

Link to comment
https://forums.phpfreaks.com/topic/212132-loading-a-secured-link/
Share on other sites

hmn...  i've been playing around with curl stuff, and it looks like my login is working fine... but i'm still at a loss as to how i can then load a pdf file (large file) into a variable and know when the pdf is ready to be read...  this is what i'm doing, and evidently the curl_exec returns true...

 

$curl = curl_init(); 
curl_setopt( $curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC );
curl_setopt( $curl, CURLOPT_USERPWD, "user:pass" );
curl_setopt( $curl, CURLOPT_URL, $this->pdfURL);
curl_exec( $curl );

 

how do i write the pdf file to a variable (fopen/fread aren't working)?

how to i track the progress of the pdf download/write?

 

thanks

i added that as well as a couple other options...

 

curl_setopt( $curl, CURLOPT_HEADER, true );
curl_setopt( $curl, CURLOPT_POST, true);
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true);

 

and now curl_exec returns:  HTTP/1.1 302 Found Date: Tue, 31 Aug 2010 05:39:29 GMT Server: Apache X-Powered-By: PHP/5.2.6 Location: /archive/2010/08/page/0001 Cache-Control: max-age=14400 Expires: Tue, 31 Aug 2010 09:39:29 GMT Content-Length: 0 Content-Type: text/html

 

does this indicate that it was found the pdf?

the content length 0 concerns me.

how do i write the pdf file contents to a local variable?

the pdf could be up to 500kb...  won't i need to wait until it's been loaded with some sort of oncomplete callback?

 

thanks

curl_exec will block until the transfer is complete..

 

curl_exec_multi will not block.. but unfortunately php doesn't really utilize events like it should therefore exec_multi should only be used when needed..

 

for NOW what you should worry about is why the transfer is failing.. you received http status code 301.. which is permanently moved, you need to tell php to follow redirects :)

 

you do this with the curloption CURLOPT_FOLLOWLOCATION

oh, damn...  now that i look at it, i see that it's redirecting in the same manner it does when doing it all manually through the browser...  EX: if you had tried the download before logging in, you'd be prompted for your login, but once login is accepted it sends you to a different entry page (the uri listed in the 302 status) from which you have to re-navigate to the pdf download.

 

is there a way to establish a connection (the way a browser does once you've logged in) and then attempt the download?  or simply re-attempt the download without losing your logged in status?

 

thanks

 

 

got it working...  needed to make one call, set the cookie and then attempt the download...  i'm sure there's some redundancy in there, but it works...

more info: http://www.php.net/manual/en/function.curl-setopt.php

 

$agent = $_SERVER[ 'HTTP_USER_AGENT' ]; 
$ref_url = "http://somesite.com"; // in case they don't allow automated logins
$data = "handle=username&password=pass"; // syntax pulled from firebug's post
$fp = fopen( "cookie.txt", "w" );
fclose( $fp );

$curl = curl_init();
curl_setopt( $curl, CURLOPT_URL, "http://somesite.com/login.php" );  
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC );
curl_setopt( $curl, CURLOPT_USERPWD, "username:pass" );
curl_setopt( $curl, CURLOPT_USERAGENT, $agent );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_COOKIEFILE, "cookie.txt" );
curl_setopt( $curl, CURLOPT_COOKIEJAR, "cookie.txt" ); 
curl_setopt( $curl, CURLOPT_SSLVERSION, 3) ;
curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, 0 );
curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, 0 );
curl_setopt( $curl, CURLOPT_HEADER, true );
curl_setopt( $curl, CURLOPT_POST, true ); 
curl_setopt( $curl, CURLOPT_TIMEOUT, 40 ); 
curl_setopt( $curl, CURLOPT_REFERER, $ref_url );   
curl_setopt( $curl, CURLOPT_POSTFIELDS, $data );
ob_start();
$result = curl_exec( $curl );   
if( $error = curl_error( $curl ) ) echo( "</br><--- cURL ERROR:" . $error . " --->" );
ob_end_clean();
curl_close( $curl ); 
//echo( "</br><--- curl:" . $result . " --->" ); 

$curl = curl_init();
curl_setopt( $curl, CURLOPT_URL, $this->pdfURL ); 
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC );
curl_setopt( $curl, CURLOPT_USERPWD, "username:pass" );
curl_setopt( $curl, CURLOPT_USERAGENT, $agent );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_COOKIEFILE, "cookie.txt" );
curl_setopt( $curl, CURLOPT_COOKIEJAR, "cookie.txt" ); 
curl_setopt( $curl, CURLOPT_SSLVERSION, 3) ;
curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, 0 );
curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, 0 );
curl_setopt( $curl, CURLOPT_HEADER, true );
curl_setopt( $curl, CURLOPT_POST, true ); 
curl_setopt( $curl, CURLOPT_TIMEOUT, 40 ); 
curl_setopt( $curl, CURLOPT_REFERER, $ref_url );   
curl_setopt( $curl, CURLOPT_POSTFIELDS, $data );
ob_start();
$result = curl_exec( $curl );   
if( $error = curl_error( $curl ) ) echo( "</br><--- cURL ERROR:" . $error . " --->" );
ob_end_clean();
curl_close( $curl ); 
echo( "</br><--- curl:" . $result . " --->" ); 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.