doncoglioni Posted March 21, 2008 Share Posted March 21, 2008 Hi all! I have a script which analyses HTML documents and summarises some information for them, related to my work. The script itself works fine, if I have access to the HTML pages concerned. However, most of the ones I need are only accessible when I log in to a (public) website with a username and password field. So my question is this: can a PHP script automatically "enter" the username and password, and log into the website, then (remaining authenticated for that session) fetch the protected pages that I need, just as if a human was sitting doing the same thing? No captchas are involved. Obviously I would need some information regarding the target page's username/password form and method of submission... but how would I go about doing this, generally? My sincere thanks for any help. Quote Link to comment Share on other sites More sharing options...
cooldude832 Posted March 21, 2008 Share Posted March 21, 2008 cURL can be used to make your server become a registered session based users on some servers Quote Link to comment Share on other sites More sharing options...
doncoglioni Posted March 21, 2008 Author Share Posted March 21, 2008 Thank you for the reply. Just noticed that cURL seems ideal -- I found this, it might be what I'm looking for. But any suggestions, please let me know: http://www.wagerank.com/2007/how-to-submit-forms-with-php/ Quote Link to comment Share on other sites More sharing options...
cunoodle2 Posted March 21, 2008 Share Posted March 21, 2008 I wrote a script that logged into my chase bank website and it would grab information on my IRA. Then store the value in a database so that I could track it easier. Here is all the code with the obvious usernames and such blocked out... <?php ######### Set up field values ######### $fields = "authmethod=userpassword&"; $fields .= "locale=en_us&"; $fields .= "usr_name=bobby&"; $fields .= "usr_password=3456&"; $fields .= "hiddenuri=/online/logon/on_successful_logon.jsp?LOB=COLLogon&"; $fields .= "LOB=COLLogon"; $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"; $ref = "https://chaseonline.chase.com/colappmgr/colportal/prospect?_nfpb=true&_pageLabel=page_logonform"; ######### Prepare curl settings and variables ######### $ch=curl_init(); curl_setopt($ch, CURLOPT_URL, "https://chaseonline.chase.com/siteminderagent/forms/formpost.fcc"); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt'); curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt'); curl_setopt($ch, CURLOPT_MAXREDIRS, 4); curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $fields); curl_setopt($ch, CURLOPT_TIMEOUT, 120); curl_setopt($ch, CURLOPT_USERAGENT, $agent); curl_setopt($ch, CURLOPT_REFERER, $ref); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); $buffer = curl_exech($ch); curl_close($ch); ######### Search html code for the needed string ######### //find the location of the market value $temp = "Market Value</td><td class="; $response_start = strPos($buffer, $temp); $response_mid = strPos($buffer, "$", $response_start); $response_end = strPos($buffer, " </td><td", $response_mid); $temp_code = substr($buffer, $response_mid + 1, ($response_end - $response_mid - 1)); $temp_code = ereg_replace (",", "", $temp_code); ######### minor error checking and database insert ######### //check to make sure that some long error or other bad data was not returned if (strlen($temp_code) < 15) { ######## Insert Values into database ######## //this connects to the server using the user name and password $db = mysql_connect("localhost","user","password") or die("Could not connect!"); //this then selects the database mysql_select_db($db = "ira_daderbase") or die ("Could not select database"); $now = mktime(); $today = mktime(0,0,0); $sql = "SELECT * FROM `retirement` WHERE `time` > '".$today."';"; $result = mysql_query($sql); //this is done to make sure that there is only 1 insert per day if (mysql_num_rows($result) == 0) { $sql = "INSERT INTO `retirement` ( `id` , `time` , `amount` )"; $sql .= "VALUES ('', '".$now."', '".$temp_code."');"; mysql_query($sql); } } ?> Hope that example helps you out Quote Link to comment Share on other sites More sharing options...
doncoglioni Posted March 21, 2008 Author Share Posted March 21, 2008 Hope that example helps you out cunoodle2; you're an absolute genius. I have no idea what I was doing wrong, but now my script is logging me in to the website. HOWEVER - all that happens is the "successful login" page is displayed, and if I try to access any further pages within the member's area, it acts as if I'm not logged in at all. It's as if it's only logging me in for a second, then immediately logging me back out again. Is there a reason behind this that you might think of? Thank you, either way! Thank you very much. Quote Link to comment Share on other sites More sharing options...
doncoglioni Posted March 21, 2008 Author Share Posted March 21, 2008 I thought I'd also post my code so you can see what's going on, and what's going wrong (in comments!) #### //Logging into the secure site, just passing username/password, referer and useragent. #### $ch=curl_init(); curl_setopt($ch, CURLOPT_URL, "https://secure.site.com/page1_userlogin.php"); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($ch, CURLOPT_COOKIEJAR, 'C:\cookies.txt'); curl_setopt($ch, CURLOPT_COOKIEFILE, 'C:\cookies.txt'); curl_setopt($ch, CURLOPT_MAXREDIRS, 4); curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, 'username=abcdefg&password=1234567'); curl_setopt($ch, CURLOPT_TIMEOUT, 120); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9'); curl_setopt($ch, CURLOPT_REFERER, 'http://secure.site.com/welcome_page.php'); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); $buffer = curl_exec($ch); curl_close($ch); #### // Now I'm logged in, I should be able to access other secure pages (to parse them with my parser!) #### #### // BUT I CAN'T - I parse the standard "not logged in" page instead :'( #### include("phpHTMLParser.php"); $parser = new phpHTMLParser(file_get_contents("https://secure.site.com/page2_this_is_the_target.php")); $HTMLObject = $parser->parse(); $HTMLObject = $parser->parse_tags(array("div")); $HTMLObject->output(); ?> Quote Link to comment Share on other sites More sharing options...
cooldude832 Posted March 21, 2008 Share Posted March 21, 2008 if you can't get any farther odds are your server isn't storing the cookies and sessions it gets, thus on a "refresh" it is logged out so to speak. You might need to look around cURL on php.net for an answer to staying authenticated. Quote Link to comment Share on other sites More sharing options...
lordfrikk Posted March 21, 2008 Share Posted March 21, 2008 Or you might pass the cookie with your login details as a parameter. You need to find out sample cookie content before doing so, though, for example in Opera by setting 'Ask before accepting cookies'. It will show you the content of the cookie when trying to log in. Quote Link to comment Share on other sites More sharing options...
doncoglioni Posted March 22, 2008 Author Share Posted March 22, 2008 Thanks, guys. I found out the problem was that I wasn't following the "correct" login procedure for my website. On further examination, I noticed it doesn't use cookies to keep track of its visitors, and that if you go back to the home page, you're logged out automatically! So I just got the HTTPHeaders extension for Firefox, and copied all the headers as I moved through every page (from home page -> login -> welcome page -> account page -> my target page) and now I have 4 sets of cURL commands with the various headers going between those pages. Quote Link to comment Share on other sites More sharing options...
cunoodle2 Posted March 24, 2008 Share Posted March 24, 2008 If you don't mine please post your code in hopes that someone in the future would be able to learn form it. Make sure you block out all usernames/passwords and important information. It took me forever to figure out that curl stuff to log into my bank as I wrote it all "from scratch." Hopefully my code helped you and your code will help the next person. =) Quote Link to comment Share on other sites More sharing options...
doncoglioni Posted March 24, 2008 Author Share Posted March 24, 2008 If you don't mine please post your code in hopes that someone in the future would be able to learn form it. Make sure you block out all usernames/passwords and important information. It took me forever to figure out that curl stuff to log into my bank as I wrote it all "from scratch." Hopefully my code helped you and your code will help the next person. =) I'd be absolutely delighted to, as soon as I get one tiny thing fixed with it - please do give my other (short) post a read; I'd love your help. It's right here ---> http://www.phpfreaks.com/forums/index.php/topic,188902.0.html Ooh, and for the benefit of us all: how do you get the fancy colored code? Mine's all greyscale Thanks again Quote Link to comment Share on other sites More sharing options...
Coreye Posted March 24, 2008 Share Posted March 24, 2008 Ooh, and for the benefit of us all: how do you get the fancy colored code? Mine's all greyscale Thanks again Make sure you add the PHP tags. <?php and ?> With <?php <?php #### //Logging into the secure site, just passing username/password, referer and useragent. #### $ch=curl_init(); curl_setopt($ch, CURLOPT_URL, "https://secure.site.com/page1_userlogin.php"); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($ch, CURLOPT_COOKIEJAR, 'C:\cookies.txt'); curl_setopt($ch, CURLOPT_COOKIEFILE, 'C:\cookies.txt'); curl_setopt($ch, CURLOPT_MAXREDIRS, 4); curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, 'username=abcdefg&password=1234567'); curl_setopt($ch, CURLOPT_TIMEOUT, 120); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9'); curl_setopt($ch, CURLOPT_REFERER, 'http://secure.site.com/welcome_page.php'); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); $buffer = curl_exec($ch); curl_close($ch); #### // Now I'm logged in, I should be able to access other secure pages (to parse them with my parser!) #### #### // BUT I CAN'T - I parse the standard "not logged in" page instead :'( #### include("phpHTMLParser.php"); $parser = new phpHTMLParser(file_get_contents("https://secure.site.com/page2_this_is_the_target.php")); $HTMLObject = $parser->parse(); $HTMLObject = $parser->parse_tags(array("div")); $HTMLObject->output(); ?> Without <?php #### //Logging into the secure site, just passing username/password, referer and useragent. #### $ch=curl_init(); curl_setopt($ch, CURLOPT_URL, "https://secure.site.com/page1_userlogin.php"); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($ch, CURLOPT_COOKIEJAR, 'C:\cookies.txt'); curl_setopt($ch, CURLOPT_COOKIEFILE, 'C:\cookies.txt'); curl_setopt($ch, CURLOPT_MAXREDIRS, 4); curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, 'username=abcdefg&password=1234567'); curl_setopt($ch, CURLOPT_TIMEOUT, 120); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9'); curl_setopt($ch, CURLOPT_REFERER, 'http://secure.site.com/welcome_page.php'); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); $buffer = curl_exec($ch); curl_close($ch); #### // Now I'm logged in, I should be able to access other secure pages (to parse them with my parser!) #### #### // BUT I CAN'T - I parse the standard "not logged in" page instead :'( #### include("phpHTMLParser.php"); $parser = new phpHTMLParser(file_get_contents("https://secure.site.com/page2_this_is_the_target.php")); $HTMLObject = $parser->parse(); $HTMLObject = $parser->parse_tags(array("div")); $HTMLObject->output(); ?> Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.