stubarny Posted June 14, 2014 Share Posted June 14, 2014 (edited) Hi, I need to log into a website to check for the latest data, and I want to automate this with php script that will use a cronjob to log in, retrieve the data and email it to me. Currently I'm stuck on the logging in part (I'm testing the script below on my squirrelmail email login page). I've tried using the snoopy php class (sourceforge.net/projects/snoopy/) but it seems the POST variables aren't being recognised (because the email program is directing me to the log in page, as opposed to telling me that my username/password is wrong). (is it even possible for a servers to accept POST variables in this way??? I'm kind of surprised but lots of poeple seem to use snoopy to login via userforms) Please could someone point me in the right direction? Thanks, Stu / load the snoopy class and initialize the object /$snoopy = new Snoopy(); / set some values /$login_form['login_username'] = 'MY_USERNAME';$login_form['secretkey'] = 'MY_PASSWORD'; $snoopy->cookies['vegetable'] = 'carrot';$snoopy->cookies['something'] = 'value'; / submit the data and get the result /$snoopy->submit('http://webmail.MY_WEBSITE_NAME.com/src/redirect.php' $p_data); / output the results /echo '<pre>' . htmlspecialchars($snoopy->results) . '</pre>'; Edited June 14, 2014 by stubarny Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/ Share on other sites More sharing options...
QuickOldCar Posted June 14, 2014 Share Posted June 14, 2014 Use curl You can simulate logging in, cookies, posts Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/#findComment-1482663 Share on other sites More sharing options...
stubarny Posted June 14, 2014 Author Share Posted June 14, 2014 OK thanks, I've checked that my host has CURL (which it does) and have tried the code below. I remembered to add the hidden variables but I still just get a blank page. Am I missing something? Thanks, Stu $url= "http://webmail.MY_WEBSITE.com/src/redirect.php"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); # POST variables $postdata = "login_username=USERNAME&secretkey=PASSWORD&js_autodetect_results=0&just_logged_in=1"; curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); curl_setopt ($ch, CURLOPT_POST, 1); $output = curl_exec($ch); echo $output; curl_close($ch); Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/#findComment-1482672 Share on other sites More sharing options...
Jacques1 Posted June 14, 2014 Share Posted June 14, 2014 (edited) Not sure if you understand the general logic of a log-in. The whole point of sending your credentials to the log-in script is to get back a session cookie pointing to an authenticated session. The body of the response is irrelevant. It's usually indeed empty, because the script tries to redirect you to the main page or whatever. So what you want is retrieve the session cookie, store it and then make another request with the cookie to the actual target page. To store cookies, you need the CURLOPT_COOKIEJAR parameter. This is an arbitrary file path. To load cookies from a file and include them in the request, you need CURLOPT_COOKIEFILE. Edited June 14, 2014 by Jacques1 Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/#findComment-1482676 Share on other sites More sharing options...
stubarny Posted June 15, 2014 Author Share Posted June 15, 2014 (edited) Thanks, yes my (very limited) curl experience is about 2 hours in total! I've tried adding an extra section to retrieve the page after initialising the cookie, but I'm still getting a blank response with the code below, I guess I'm still doing something wrong? ... $username="my_user"; $password="my_passs"; $url="http://webmail.WEBSITE.com/src/redirect.php"; $cookie="cookie.txt"; $postdata = "login_username=USERNAME&secretkey=PASSWORD&js_autodetect_results=0&just_logged_in=1"; # get the cookie $ch = curl_init(); curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); curl_setopt ($ch, CURLOPT_TIMEOUT, 60); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); curl_setopt ($ch, CURLOPT_REFERER, $url); curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); curl_setopt ($ch, CURLOPT_POST, 1); $result = curl_exec ($ch); echo $result; curl_close($ch); # retrieve the inbox page $url="http://webmail.WEBSITE.com/src/webmail.php"; $ch = curl_init(); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //read cookies from here curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); curl_setopt ($ch, CURLOPT_TIMEOUT, 60); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_REFERER, $url); curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); curl_setopt ($ch, CURLOPT_POST, 1); $result = curl_exec ($ch); curl_close($ch); Edited June 15, 2014 by stubarny Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/#findComment-1482699 Share on other sites More sharing options...
QuickOldCar Posted June 16, 2014 Share Posted June 16, 2014 (edited) cookie file have correct permissions? Edited June 16, 2014 by QuickOldCar Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/#findComment-1482706 Share on other sites More sharing options...
Jacques1 Posted June 16, 2014 Share Posted June 16, 2014 It's again a problem with the logic: You're still trying to output the response of the log-in request, but there's nothing to see there. Like I already said, the response probably doesn't say anything. It just tries to redirect you. After the log-in request, you again make a POST request with your credentials, this time to the main page. This doesn't make any sense to me. Shouldn't this be a simple GET request with no parameters at all? Before you write any line of code, get clear about what you want to do. Write it down or draw a diagram, if necessary. Then do the implementation. Don't start with a bunch of random code and try to fix it by trial-and-error. Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/#findComment-1482713 Share on other sites More sharing options...
stubarny Posted June 16, 2014 Author Share Posted June 16, 2014 OK great, getting closer! I can see the structure (there's a frame divide where I expect it to be), so I'm convinced I'm logged in and the inbox webpage if being (partly) returned. But I'm getting two errors returned: Not FoundThe requested URL /left_main.php was not found on this server. Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request. Not FoundThe requested URL /right_main.php was not found on this server. Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request. I don't really see why the files wouldn't be found due to an incorrect file path, because I thought curl just requests the webpage - if that webpage requires other files aren't those files called server side? (I've visited the squirrelmail inbox manually and it definately works). Here's the code that's almost working: $url="http://webmail.WEBSITE.com/src/redirect.php"; $cookie="cookie.txt"; $postdata = "login_username=USERNAME&secretkey=PASSWORD&js_autodetect_results=0&just_logged_in=1"; # get the cookie $ch = curl_init(); curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); curl_setopt ($ch, CURLOPT_TIMEOUT, 60); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); curl_setopt ($ch, CURLOPT_REFERER, $url); curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); curl_setopt ($ch, CURLOPT_POST, 1); $result = curl_exec ($ch); curl_close($ch); # retrieve the inbox page $ch = curl_init(); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //read cookies from here curl_setopt($ch, CURLOPT_URL, "http://webmail.WEBSITE.com/src/webmail.php"); curl_setopt($ch, CURLOPT_HEADER, 0); curl_exec($ch); curl_close($ch); Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/#findComment-1482751 Share on other sites More sharing options...
QuickOldCar Posted June 16, 2014 Share Posted June 16, 2014 need to fix relative links? /left_main.php /right_main.php Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/#findComment-1482753 Share on other sites More sharing options...
stubarny Posted June 16, 2014 Author Share Posted June 16, 2014 (edited) yes but I'm not calling those links (so I can't fix them?) - the email login page calls them on the server side? I don't see why me using curl makes the relative links to fail Edited June 16, 2014 by stubarny Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/#findComment-1482754 Share on other sites More sharing options...
Solution Jacques1 Posted June 16, 2014 Solution Share Posted June 16, 2014 The two URLs are frame sources. When you render the cURL response in your browser, the browser tries to load the frame content from /left_main.php and /right_main.php respectively. Those relative URLs is resolved according to the current host, which is localhost. And you obviously don't have such scripts on your localhost. You simply passed the wrong URL to cURL. You don't need the URL of some page with frames on it, you need to URL of the actual data you want. So if the mails are on, say, right_main.php, then you pass that URL to cURL. Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/#findComment-1482755 Share on other sites More sharing options...
stubarny Posted June 16, 2014 Author Share Posted June 16, 2014 Many thanks Jacques, it's working perfectly. Quote Link to comment https://forums.phpfreaks.com/topic/289159-logging-in-to-a-website-automatically-and-retrieving-the-data/#findComment-1482756 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.