wilbur_wc Posted August 31, 2010 Share Posted August 31, 2010 i'm new to php and don't really know where to start here... i'm automating a system that scrapes a site for a particular pdf download link (got this far), downloads it, parses the pdf, etc... problem is that you must be logged in (while viewing in the browser) in order to access the pdf... if you're not logged in, you are redirected and the download fails... i do have a proper login... how would i go about utilizing my login in order automate the pdf download? is there a way to send the login with the url request? or open a stream, login, and retry the download? thanks Quote Link to comment https://forums.phpfreaks.com/topic/212132-loading-a-secured-link/ Share on other sites More sharing options...
RussellReal Posted August 31, 2010 Share Posted August 31, 2010 curl_exec Quote Link to comment https://forums.phpfreaks.com/topic/212132-loading-a-secured-link/#findComment-1105462 Share on other sites More sharing options...
wilbur_wc Posted August 31, 2010 Author Share Posted August 31, 2010 hmn... i've been playing around with curl stuff, and it looks like my login is working fine... but i'm still at a loss as to how i can then load a pdf file (large file) into a variable and know when the pdf is ready to be read... this is what i'm doing, and evidently the curl_exec returns true... $curl = curl_init(); curl_setopt( $curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC ); curl_setopt( $curl, CURLOPT_USERPWD, "user:pass" ); curl_setopt( $curl, CURLOPT_URL, $this->pdfURL); curl_exec( $curl ); how do i write the pdf file to a variable (fopen/fread aren't working)? how to i track the progress of the pdf download/write? thanks Quote Link to comment https://forums.phpfreaks.com/topic/212132-loading-a-secured-link/#findComment-1105502 Share on other sites More sharing options...
RussellReal Posted August 31, 2010 Share Posted August 31, 2010 setopt CURLOPT_RETURNTRANSFER true Quote Link to comment https://forums.phpfreaks.com/topic/212132-loading-a-secured-link/#findComment-1105505 Share on other sites More sharing options...
wilbur_wc Posted August 31, 2010 Author Share Posted August 31, 2010 i added that as well as a couple other options... curl_setopt( $curl, CURLOPT_HEADER, true ); curl_setopt( $curl, CURLOPT_POST, true); curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true); and now curl_exec returns: HTTP/1.1 302 Found Date: Tue, 31 Aug 2010 05:39:29 GMT Server: Apache X-Powered-By: PHP/5.2.6 Location: /archive/2010/08/page/0001 Cache-Control: max-age=14400 Expires: Tue, 31 Aug 2010 09:39:29 GMT Content-Length: 0 Content-Type: text/html does this indicate that it was found the pdf? the content length 0 concerns me. how do i write the pdf file contents to a local variable? the pdf could be up to 500kb... won't i need to wait until it's been loaded with some sort of oncomplete callback? thanks Quote Link to comment https://forums.phpfreaks.com/topic/212132-loading-a-secured-link/#findComment-1105511 Share on other sites More sharing options...
RussellReal Posted August 31, 2010 Share Posted August 31, 2010 curl_exec will block until the transfer is complete.. curl_exec_multi will not block.. but unfortunately php doesn't really utilize events like it should therefore exec_multi should only be used when needed.. for NOW what you should worry about is why the transfer is failing.. you received http status code 301.. which is permanently moved, you need to tell php to follow redirects you do this with the curloption CURLOPT_FOLLOWLOCATION Quote Link to comment https://forums.phpfreaks.com/topic/212132-loading-a-secured-link/#findComment-1105516 Share on other sites More sharing options...
wilbur_wc Posted August 31, 2010 Author Share Posted August 31, 2010 oh, damn... now that i look at it, i see that it's redirecting in the same manner it does when doing it all manually through the browser... EX: if you had tried the download before logging in, you'd be prompted for your login, but once login is accepted it sends you to a different entry page (the uri listed in the 302 status) from which you have to re-navigate to the pdf download. is there a way to establish a connection (the way a browser does once you've logged in) and then attempt the download? or simply re-attempt the download without losing your logged in status? thanks Quote Link to comment https://forums.phpfreaks.com/topic/212132-loading-a-secured-link/#findComment-1105632 Share on other sites More sharing options...
wilbur_wc Posted August 31, 2010 Author Share Posted August 31, 2010 got it working... needed to make one call, set the cookie and then attempt the download... i'm sure there's some redundancy in there, but it works... more info: http://www.php.net/manual/en/function.curl-setopt.php $agent = $_SERVER[ 'HTTP_USER_AGENT' ]; $ref_url = "http://somesite.com"; // in case they don't allow automated logins $data = "handle=username&password=pass"; // syntax pulled from firebug's post $fp = fopen( "cookie.txt", "w" ); fclose( $fp ); $curl = curl_init(); curl_setopt( $curl, CURLOPT_URL, "http://somesite.com/login.php" ); curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt( $curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC ); curl_setopt( $curl, CURLOPT_USERPWD, "username:pass" ); curl_setopt( $curl, CURLOPT_USERAGENT, $agent ); curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true ); curl_setopt( $curl, CURLOPT_COOKIEFILE, "cookie.txt" ); curl_setopt( $curl, CURLOPT_COOKIEJAR, "cookie.txt" ); curl_setopt( $curl, CURLOPT_SSLVERSION, 3) ; curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, 0 ); curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, 0 ); curl_setopt( $curl, CURLOPT_HEADER, true ); curl_setopt( $curl, CURLOPT_POST, true ); curl_setopt( $curl, CURLOPT_TIMEOUT, 40 ); curl_setopt( $curl, CURLOPT_REFERER, $ref_url ); curl_setopt( $curl, CURLOPT_POSTFIELDS, $data ); ob_start(); $result = curl_exec( $curl ); if( $error = curl_error( $curl ) ) echo( "</br><--- cURL ERROR:" . $error . " --->" ); ob_end_clean(); curl_close( $curl ); //echo( "</br><--- curl:" . $result . " --->" ); $curl = curl_init(); curl_setopt( $curl, CURLOPT_URL, $this->pdfURL ); curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt( $curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC ); curl_setopt( $curl, CURLOPT_USERPWD, "username:pass" ); curl_setopt( $curl, CURLOPT_USERAGENT, $agent ); curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true ); curl_setopt( $curl, CURLOPT_COOKIEFILE, "cookie.txt" ); curl_setopt( $curl, CURLOPT_COOKIEJAR, "cookie.txt" ); curl_setopt( $curl, CURLOPT_SSLVERSION, 3) ; curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, 0 ); curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, 0 ); curl_setopt( $curl, CURLOPT_HEADER, true ); curl_setopt( $curl, CURLOPT_POST, true ); curl_setopt( $curl, CURLOPT_TIMEOUT, 40 ); curl_setopt( $curl, CURLOPT_REFERER, $ref_url ); curl_setopt( $curl, CURLOPT_POSTFIELDS, $data ); ob_start(); $result = curl_exec( $curl ); if( $error = curl_error( $curl ) ) echo( "</br><--- cURL ERROR:" . $error . " --->" ); ob_end_clean(); curl_close( $curl ); echo( "</br><--- curl:" . $result . " --->" ); Quote Link to comment https://forums.phpfreaks.com/topic/212132-loading-a-secured-link/#findComment-1105796 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.