Jump to content

logging in to a website automatically and retrieving the data


stubarny
Go to solution Solved by Jacques1,

Recommended Posts

Hi,

 

I need to log into a website to check for the latest data, and I want to automate this with php script that will use a cronjob to log in, retrieve the data and email it to me.

 

Currently I'm stuck on the logging in part (I'm testing the script below on my squirrelmail email login page). I've tried using the snoopy php class (sourceforge.net/projects/snoopy/) but it seems the POST variables aren't being recognised (because the email program is directing me to the log in page, as opposed to telling me that my username/password is wrong).

 

(is it even possible for a servers to accept POST variables in this way??? I'm kind of surprised but lots of poeple seem to use snoopy to login via userforms)

 

Please could someone point me in the right direction?

 

Thanks,

 

Stu

 

 

/ load the snoopy class and initialize the object /
$snoopy = new Snoopy();

/ set some values /
$login_form['login_username'] = 'MY_USERNAME';
$login_form['secretkey'] = 'MY_PASSWORD';

$snoopy->cookies['vegetable'] = 'carrot';
$snoopy->cookies['something'] = 'value';

/ submit the data and get the result /
$snoopy->submit('http://webmail.MY_WEBSITE_NAME.com/src/redirect.php' $p_data);

/ output the results /
echo '<pre>' . htmlspecialchars($snoopy->results) . '</pre>';

Edited by stubarny
Link to comment
Share on other sites

OK thanks, I've checked that my host has CURL (which it does) and have tried the code below. I remembered to add the hidden variables but I still just get a blank page. Am I missing something?

 

Thanks,

 

Stu

$url= "http://webmail.MY_WEBSITE.com/src/redirect.php";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);

#    POST variables
    $postdata = "login_username=USERNAME&secretkey=PASSWORD&js_autodetect_results=0&just_logged_in=1";

    curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata);
    curl_setopt ($ch, CURLOPT_POST, 1);

$output = curl_exec($ch);
echo $output;
curl_close($ch);
Link to comment
Share on other sites

Not sure if you understand the general logic of a log-in.

 

The whole point of sending your credentials to the log-in script is to get back a session cookie pointing to an authenticated session. The body of the response is irrelevant. It's usually indeed empty, because the script tries to redirect you to the main page or whatever.

 

So what you want is retrieve the session cookie, store it and then make another request with the cookie to the actual target page.

 

To store cookies, you need the CURLOPT_COOKIEJAR parameter. This is an arbitrary file path. To load cookies from a file and include them in the request, you need CURLOPT_COOKIEFILE.

Edited by Jacques1
Link to comment
Share on other sites

Thanks, yes my (very limited) curl experience is about 2 hours in total!

 

I've tried adding an extra section to retrieve the page after initialising the cookie, but I'm still getting a blank response with the code below, I guess I'm still doing something wrong? ...

$username="my_user"; 
$password="my_passs"; 
$url="http://webmail.WEBSITE.com/src/redirect.php"; 
$cookie="cookie.txt"; 

$postdata = "login_username=USERNAME&secretkey=PASSWORD&js_autodetect_results=0&just_logged_in=1"; 

#	get the cookie
$ch = curl_init(); 
curl_setopt ($ch, CURLOPT_URL, $url); 
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); 
curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0); 
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
curl_setopt ($ch, CURLOPT_REFERER, $url); 

curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); 
curl_setopt ($ch, CURLOPT_POST, 1); 
$result = curl_exec ($ch); 

echo $result;  

curl_close($ch);



#	retrieve the inbox page
$url="http://webmail.WEBSITE.com/src/webmail.php"; 

$ch = curl_init(); 

curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //read cookies from here

curl_setopt ($ch, CURLOPT_URL, $url); 
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); 
curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0); 
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt ($ch, CURLOPT_REFERER, $url); 

curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); 
curl_setopt ($ch, CURLOPT_POST, 1); 
$result = curl_exec ($ch); 



curl_close($ch);
Edited by stubarny
Link to comment
Share on other sites

It's again a problem with the logic:

  • You're still trying to output the response of the log-in request, but there's nothing to see there. Like I already said, the response probably doesn't say anything. It just tries to redirect you.
  • After the log-in request, you again make a POST request with your credentials, this time to the main page. This doesn't make any sense to me. Shouldn't this be a simple GET request with no parameters at all?

Before you write any line of code, get clear about what you want to do. Write it down or draw a diagram, if necessary. Then do the implementation. Don't start with a bunch of random code and try to fix it by trial-and-error.

Link to comment
Share on other sites

OK great, getting closer!

 

I can see the structure (there's a frame divide where I expect it to be), so I'm convinced I'm logged in and the inbox webpage if being (partly) returned. But I'm getting two errors returned:

 

Not Found

The requested URL /left_main.php was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

 

Not Found

The requested URL /right_main.php was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

 

 

 

I don't really see why the files wouldn't be found due to an incorrect file path, because I thought curl just requests the webpage - if that webpage requires other files aren't those files called server side? (I've visited the squirrelmail inbox manually and it definately works).

 

Here's the code that's almost working:

	$url="http://webmail.WEBSITE.com/src/redirect.php"; 
	$cookie="cookie.txt"; 
	
	$postdata = "login_username=USERNAME&secretkey=PASSWORD&js_autodetect_results=0&just_logged_in=1"; 
	
	#	get the cookie
	$ch = curl_init(); 
	curl_setopt ($ch, CURLOPT_URL, $url); 
	curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
	curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"); 
	curl_setopt ($ch, CURLOPT_TIMEOUT, 60); 
	curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0); 
	curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
	curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie); 
	curl_setopt ($ch, CURLOPT_REFERER, $url); 
	
	curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata); 
	curl_setopt ($ch, CURLOPT_POST, 1); 
	$result = curl_exec ($ch);  

	curl_close($ch);



#	retrieve the inbox page
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); //read cookies from here
curl_setopt($ch, CURLOPT_URL, "http://webmail.WEBSITE.com/src/webmail.php");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);

Link to comment
Share on other sites

  • Solution

The two URLs are frame sources. When you render the cURL response in your browser, the browser tries to load the frame content from /left_main.php and /right_main.php respectively. Those relative URLs is resolved according to the current host, which is localhost. And you obviously don't have such scripts on your localhost.

 

You simply passed the wrong URL to cURL. You don't need the URL of some page with frames on it, you need to URL of the actual data you want. So if the mails are on, say, right_main.php, then you pass that URL to cURL.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.