flyaw3 Posted March 31, 2017 Share Posted March 31, 2017 Hi there, I have been trying to fix this code and after extensive research I have to bring some brains (you guys) to fix it for me. :-) The purpose of this script is to login in a website and eventually download some files I need for work. I need the downloaded files in the server for later use, I would like to schedule at certain hours of the day since the data is always updating. Checking the errors from cURL I am getting a 503 and errno 22 Thank you This is my code so far: <?php //The username or email address of the account. define('loginName', 'Username'); //The password of the account. define('password', 'Password!'); //Set a user agent. This basically tells the server that we are using Chrome define('USER_AGENT', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2309.372 Safari/537.36'); //Where our cookie information will be stored (needed for authentication). define('COOKIE_FILE', 'cookie.txt'); //URL of the login form. define('LOGIN_FORM_URL', 'https://www.orangefl.realtaxlien.com/index.cfm?folder=home'); //Login action URL. Sometimes, this is the same URL as the login form. define('LOGIN_ACTION_URL', 'https://www.orangefl.realtaxlien.com/index.cfm?folder=summary'); //An associative array that represents the required form fields. //You will need to change the keys / index names to match the name of the form //fields. $postValues = array( 'Username' => loginName, 'Password' => password ); //Initiate cURL. $curl = curl_init(); //Set the URL that we want to send our POST request to. In this //case, it's the action URL of the login form. curl_setopt($curl, CURLOPT_URL, LOGIN_ACTION_URL); //Tell cURL that we want to carry out a POST request. curl_setopt($curl, CURLOPT_POST, true); //Set our post fields / date (from the array above). curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($postValues)); //We don't want any HTTPS errors. curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); //Where our cookie details are saved. This is typically required //for authentication, as the session ID is usually saved in the cookie file. curl_setopt($curl, CURLOPT_COOKIEJAR, COOKIE_FILE); //Sets the user agent. Some websites will attempt to block bot user agents. //Hence the reason I gave it a Chrome user agent. curl_setopt($curl, CURLOPT_USERAGENT, USER_AGENT); //Tells cURL to return the output once the request has been executed. curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); //Allows us to set the referer header. In this particular case, we are //fooling the server into thinking that we were referred by the login form. curl_setopt($curl, CURLOPT_REFERER, LOGIN_FORM_URL); //Do we want to follow any redirects? curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true); //Execute the login request. curl_exec($curl); //Check for errors! if(curl_errno($curl)){ throw new Exception(curl_error($curl)); } if (@$_GET['curl']=="yes") { header('HTTP/1.1 503 Service Temporarily Unavailable'); } else { $curl=curl_init($url = "http://".$_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF']."?curl=yes"); curl_setopt($curl, CURLOPT_FAILONERROR, true); $response=curl_exec($curl); $http_status = curl_getinfo($curl, CURLINFO_HTTP_CODE); $curl_errno= curl_errno($curl); if ($http_status==503) echo "HTTP Status == 503 <br/>"; echo "Curl Errno returned $curl_errno <br/>"; } //We should be logged in by now. Let's attempt to access a password protected page curl_setopt($curl, CURLOPT_URL, 'https://orangefl.realtaxlien.com/index.cfm?folder=itemsetcountyheld'); //Use the same cookie file. curl_setopt($curl, CURLOPT_COOKIEJAR, COOKIE_FILE); //Use the same user agent, just in case it is used by the server for session validation. curl_setopt($curl, CURLOPT_USERAGENT, USER_AGENT); //We don't want any HTTPS / SSL errors. curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); //Execute the GET request and print out the result. echo curl_exec($curl); ?> Quote Link to comment Share on other sites More sharing options...
requinix Posted March 31, 2017 Share Posted March 31, 2017 if (@$_GET['curl']=="yes") { header('HTTP/1.1 503 Service Temporarily Unavailable'); } else { $curl=curl_init($url = "http://".$_SERVER['SERVER_NAME'].$_SERVER['PHP_SELF']."?curl=yes"); curl_setopt($curl, CURLOPT_FAILONERROR, true); $response=curl_exec($curl); $http_status = curl_getinfo($curl, CURLINFO_HTTP_CODE); $curl_errno= curl_errno($curl); if ($http_status==503) echo "HTTP Status == 503 <br/>"; echo "Curl Errno returned $curl_errno <br/>"; }What the heck is that? If ?curl then it responds with a 503, and if not then it requests itself? Not only does that not make any sense, you're guaranteed to always get a 503 at this point. And by the way, the login fields aren't "Username" and "Password". Not sure where you got those from but those aren't the names I'm seeing. Quote Link to comment Share on other sites More sharing options...
Jacques1 Posted April 1, 2017 Share Posted April 1, 2017 (edited) flyaw3: I've removed your nonsense. Insulting staff members is definitely above your pay grade, so either remember what mommy said about treating adults with respect, or go away. Your code indeed makes no sense, and that's because you blindly copy and paste PHP snippets from the Internet without understanding any of them. If you think this is programming, you should probably find a new hobby. Edited April 1, 2017 by Jacques1 Quote Link to comment Share on other sites More sharing options...
elyfrank Posted April 1, 2017 Share Posted April 1, 2017 Flyaw3, Did you ever fixed your code? Did you remove the code that didn't make sense? Quote Link to comment Share on other sites More sharing options...
flyaw3 Posted April 1, 2017 Author Share Posted April 1, 2017 Yeah, I did, but I am still unable to scrape the page or navigate to a secured page, and btw the username is actually LoginName. I will keep trying. Thanks for the reply. Quote Link to comment Share on other sites More sharing options...
Solution requinix Posted April 1, 2017 Solution Share Posted April 1, 2017 Do you understand what that 503 block of code does? It will prohibit the code from moving any further. I'm not sure if you need ?curl detection at all (I wouldn't think so from what you're describing) so I'd say to get rid of that whole thing. Otherwise, if you're sure you have the right form fields then the rest seems correct to me. Quote Link to comment Share on other sites More sharing options...
flyaw3 Posted April 1, 2017 Author Share Posted April 1, 2017 Hey requinix, I apologize to you, maybe I took it the wrong way, for now on I just shut up and listen. I already got rid of that piece of code, it shouldn't have been there in the first place, but I still can't navigate to the secure pages, there is a redirect just after the login page with a form button that needs to be click (using onclick javascript) that might be the problem. Thanks Quote Link to comment Share on other sites More sharing options...
requinix Posted April 1, 2017 Share Posted April 1, 2017 It has Javascript? That sucks. cURL can follow header redirects but redirects and Javascript stuff you'll have to deal with yourself. As long as there aren't random values or anti-bot measures in place then it shouldn't be so hard. Can you add another cURL call for that second form? What is the stuff you have to deal with, since we can't see it for ourselves? Quote Link to comment Share on other sites More sharing options...
flyaw3 Posted April 1, 2017 Author Share Posted April 1, 2017 Hi, I was able to navigate to the secure page, now I have to find a way to download the files I need which is going to be complicated for me but I will do some research first before I start asking more questions. Thank you for your help. You can mark this as answered. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.