natasha_thomas Posted December 4, 2010 Share Posted December 4, 2010 Folks, I need to do few scrapping from a Site, problem is, i have to be logged in first to that site to access any content. Link: https://www.majesticseo.com/account/login?redirect=%2Faccount%2Flogin Login Details are: Email: wow@mailinator.com Password: natashaworld What PHP code do i need to use to login to this site, so i can continue running my other Codes to scrape few data??? Cheers Natasha T Quote Link to comment https://forums.phpfreaks.com/topic/220628-how-to-login-to-a-site-with-php/ Share on other sites More sharing options...
natasha_thomas Posted December 4, 2010 Author Share Posted December 4, 2010 Folks, I need to do few scrapping from a Site, problem is, i have to be logged in first to that site to access any content. Link: https://www.majesticseo.com/account/login?redirect=%2Faccount%2Flogin Login Details are: Email: wow@mailinator.com Password: natashaworld What PHP code do i need to use to login to this site, so i can continue running my other Codes to scrape few data??? Cheers Natasha T Will it be Something like this... <?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "https://www.majesticseo.com/account/login?redirect=%2Faccount%2Flogin"); curl_setopt ($ch, CURLOPT_POST, 1); curl_setopt ($ch, CURLOPT_POSTFIELDS, "LoginEmail=wow@mailinator.com&LoginPassword=natashaworld"); curl_setopt ($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); $login = curl_exec ($ch); curl_setopt($ch, CURLOPT_URL, "https://www.majesticseo.com/account/login?redirect=%2Faccount%2Flogin"); $members = curl_exec ($ch); curl_close ($ch); ?> 1- Is this code correct? 2- How can i confirm that i am logged in? Coz this Code will run on my server and not on my browser.... Cheers Quote Link to comment https://forums.phpfreaks.com/topic/220628-how-to-login-to-a-site-with-php/#findComment-1142833 Share on other sites More sharing options...
QuickOldCar Posted December 4, 2010 Share Posted December 4, 2010 Am I incorrect that you do not need to login to the backend of there, aren't you supposed to upload some sort of site, then from there it would be your own login credentials from w/e platform or scripts you will then have. I did try for the heck of it to use curl and login your information, it didn't work. I then modified the code with https and my own scraper, what seems to happen is the site always wants to redirect back to their main page for a check, not sure how to go about that through curl. Quote Link to comment https://forums.phpfreaks.com/topic/220628-how-to-login-to-a-site-with-php/#findComment-1142849 Share on other sites More sharing options...
QuickOldCar Posted December 4, 2010 Share Posted December 4, 2010 The below code works , I can read Welcome natasha thomas. You should now edit your login password on your site before someone does harm to your account. Here is a link to the working code http://get.blogdns.com/dynaindex/testscrape I just echoed the html. You will see that the href links go to my own site, thats because that site did self versus the full http link. Would have to fix those with maybe dom, I had to do that for my page parser. <?php $url = "https://www.majesticseo.com/account/login?redirect=%2Faccount"; /*connect to the url using curl to see if exists and get the information*/ $cookie = tempnam('tmp','cookie'); $cookie_file_path = "tmp/"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3'); curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); curl_setopt ($ch, CURLOPT_POST, 2); curl_setopt ($ch, CURLOPT_POSTFIELDS, "LoginEmail=wow@mailinator.com&LoginPassword=natashaworld"); curl_setopt($ch, CURLOPT_TIMEOUT, 15); curl_setopt($ch, CURLOPT_MAXREDIRS, 15); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt ($ch, CURLOPT_FILETIME, 1); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($ch, CURLOPT_ENCODING , ""); $curl_session = curl_init(); curl_setopt($curl_session, CURLOPT_COOKIEJAR, $cookie); curl_setopt($curl_session, CURLOPT_COOKIEFILE, $cookie_file_path); curl_setopt($curl_session, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3'); curl_setopt($curl_session, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); curl_setopt ($curl_session, CURLOPT_POST, 2); curl_setopt ($curl_session, CURLOPT_POSTFIELDS, "LoginEmail=wow@mailinator.com&LoginPassword=natashaworld"); curl_setopt($curl_session, CURLOPT_ENCODING , ""); curl_setopt($curl_session, CURLOPT_TIMEOUT, 15); curl_setopt($curl_session, CURLOPT_HEADER, 1); curl_setopt($curl_session, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($curl_session, CURLOPT_HEADER, true); curl_setopt($curl_session, CURLOPT_MAXREDIRS, 15); curl_setopt($curl_session, CURLOPT_RETURNTRANSFER, true); curl_setopt( $curl_session, CURLOPT_AUTOREFERER, true ); curl_setopt ($curl_session, CURLOPT_HTTPGET, true); curl_setopt($curl_session, CURLOPT_URL, $url); $string = mysql_real_escape_string(curl_exec($curl_session)); $html = mysql_real_escape_string(curl_exec ($ch)); $info = curl_getinfo($ch); /*curl response check and to resolve url to the actual location*/ $response = curl_getinfo( $ch ); if ($response['http_code'] == 301 || $response['http_code'] == 302) { ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"); $headers = get_headers($response['url']); $location = ""; foreach( $headers as $value ) { if ( substr( strtolower($value), 0, 9 ) == "location:" ) return get_final_url( trim( substr( $value, 9, strlen($value) ) ) ); } } if ( preg_match("/window\.location\.replace\('(.*)'\)/i", $content, $value) || preg_match("/window\.location\=[\"'](.*)[\"']/i", $content, $value) || preg_match("/location\.href\=[\"'](.*)[\"']/i", $content, $value) ) { $finalurl = get_final_url($value[1]); } else { $finalurl = $response['url']; } $html = curl_exec($ch); $header = "Location: "; /*parse the url into the main domain name*/ function get_main_domain($temp_main_domain) { $domain_parts = explode('/', $temp_main_domain); if ($domain_parts[0]=='http:' || $domain_parts[0]=='https:') { $temp_main_domain= $domain_parts[2]; } else { $temp_main_domain= $domain_parts[0]; } unset($domain_parts); $domain_parts = explode('.', $temp_main_domain); $positions=count($domain_parts); $positions-=3; if (strlen($domain_parts[($positions+2)])==2) { $final_main_url=$domain_parts[$positions].'.'.$domain_parts[($positions+1)].'.'.$domain_parts[($positions+2)]; } else if (strlen($domain_parts[($positions+2)])==0) { $final_main_url=$domain_parts[($positions)].'.'.$domain_parts[($positions+1)]; } else { $final_main_url=$domain_parts[($positions+1)].'.'.$domain_parts[($positions+2)]; } return $final_main_url; } $final_main_parsed_host = get_main_domain($finalurl); $final_main_parsed_host = strtolower($final_main_parsed_host); echo "Main Parsed Host: $final_main_parsed_host<br />"; /*because stupid people resolve their sites to all uppercase - i have to check it and attempt to fix it*/ $checknew_parse_url = $finalurl; function checkgetparsedHost($checknew_parse_url) { $checkparsedUrl = parse_url(trim($checknew_parse_url)); return trim($checkparsedUrl[host] ? $checkparsedUrl[host] : array_shift(explode('/', $checkparsedUrl[path], 2))); } $checkget_parse_url = parse_url($checknew_parse_url, PHP_URL_HOST); $checkhost_parse_url .= str_replace(array('Www.','WWW.'), 'www.', $checkget_parse_url); $checkhost_parse_url = strtolower($checkhost_parse_url); $checkport_parse_url = parse_url($checknew_parse_url, PHP_URL_PORT); $checkuser_parse_url = parse_url($checknew_parse_url, PHP_URL_USER); $checkpass_parse_url = parse_url($checknew_parse_url, PHP_URL_PASS); $checkget_path_parse_url = parse_url($checknew_parse_url, PHP_URL_PATH); $checkpath_parse_url .= str_replace(array('Www.','WWW.'), 'www.', $checkget_path_parse_url); $checkquery_parse_url = parse_url($checknew_parse_url, PHP_URL_QUERY); $checkquery_parse_url = "?$checkquery_parse_url"; $checkquery_parse_url = rtrim($checkquery_parse_url, '#'); $checkfragment_parse_url = parse_url($checknew_parse_url, PHP_URL_FRAGMENT); $checkfragment_parse_url = "#$checkfragment_parse_url"; $checkhostpath_url = "$checkhost_parse_url$checkpath_parse_url"; $checkhostpath_url = rtrim($checkhostpath_url, '?'); $checkquery_parse_url = rtrim($checkquery_parse_url, '?'); $checkhostpathquery_url = "$checkhost_parse_url$checkpath_parse_url$checkquery_parse_url"; $checkcomplete_url = "$checkhost_parse_url$checkport_parse_url$checkuser_parse_url$checkpass_parse_url$checkpath_parse_url$checkquery_parse_url$checkfragment_parse_url"; $checkcomplete_url = rtrim($checkcomplete_url, '#'); $url = "http://$checkcomplete_url"; echo "Resolved: $finalurl"; echo ""; echo "<br />"; echo "Lowercased: $url"; echo ""; $md5_url = md5($url); print "<br />"; echo ""; /*if was a curl error - job ends and back to url insert area*/ if (!$html) { ?> <br /><FONT COLOR=red>No url inserted:</b></FONT> <br /><B><FONT COLOR=orange>Please try another url, that website may not exist. The url may or may not require the www.</b></FONT><br /> <?php exit; } if (curl_errno($ch)) { ?> <B><FONT COLOR=orange> <?php curl_error($ch); ?> </b></FONT><br /> <?php } else { ?> <br /> <?php $errmsg = curl_error($ch); curl_close($ch); $valid = array(200, 201, 202, 203, 204, 205, 206, 207, 300, 301, 302, 303, 304, 305, 306, 307); if (in_array($info['http_code'], $valid)) { ?> <B><FONT COLOR=lime>Connection OK</b></FONT> <?php } $invalid = array(400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 500, 501, 502, 503, 504, 505, 506, 507, 510); if (in_array($info['http_code'], $invalid)) { ?> <B><FONT COLOR=red>Connection Error</b></FONT> <?php } ?> <br /> <?php $redirected = array(300, 301, 302, 303, 307); if (in_array($info['http_code'], $redirected)) { ?> <B><FONT COLOR=orange>Redirection</b></FONT> <?php } $redirectedno = array(200, 201, 202, 203, 204, 205, 206, 207); if (in_array($info['http_code'], $redirectedno)) { echo "<FONT COLOR=lime> Direct Connection</b></FONT><br />"; echo $html; } print <<<END <br /> END; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/220628-how-to-login-to-a-site-with-php/#findComment-1142852 Share on other sites More sharing options...
natasha_thomas Posted December 4, 2010 Author Share Posted December 4, 2010 The below code works , I can read Welcome natasha thomas. You should now edit your login password on your site before someone does harm to your account. Here is a link to the working code http://get.blogdns.com/dynaindex/testscrape I just echoed the html. You will see that the href links go to my own site, thats because that site did self versus the full http link. Would have to fix those with maybe dom, I had to do that for my page parser. <?php $url = "https://www.majesticseo.com/account/login?redirect=%2Faccount"; /*connect to the url using curl to see if exists and get the information*/ $cookie = tempnam('tmp','cookie'); $cookie_file_path = "tmp/"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3'); curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); curl_setopt ($ch, CURLOPT_POST, 2); curl_setopt ($ch, CURLOPT_POSTFIELDS, "LoginEmail=wow@mailinator.com&LoginPassword=natashaworld"); curl_setopt($ch, CURLOPT_TIMEOUT, 15); curl_setopt($ch, CURLOPT_MAXREDIRS, 15); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt ($ch, CURLOPT_FILETIME, 1); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($ch, CURLOPT_ENCODING , ""); $curl_session = curl_init(); curl_setopt($curl_session, CURLOPT_COOKIEJAR, $cookie); curl_setopt($curl_session, CURLOPT_COOKIEFILE, $cookie_file_path); curl_setopt($curl_session, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3'); curl_setopt($curl_session, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); curl_setopt ($curl_session, CURLOPT_POST, 2); curl_setopt ($curl_session, CURLOPT_POSTFIELDS, "LoginEmail=wow@mailinator.com&LoginPassword=natashaworld"); curl_setopt($curl_session, CURLOPT_ENCODING , ""); curl_setopt($curl_session, CURLOPT_TIMEOUT, 15); curl_setopt($curl_session, CURLOPT_HEADER, 1); curl_setopt($curl_session, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($curl_session, CURLOPT_HEADER, true); curl_setopt($curl_session, CURLOPT_MAXREDIRS, 15); curl_setopt($curl_session, CURLOPT_RETURNTRANSFER, true); curl_setopt( $curl_session, CURLOPT_AUTOREFERER, true ); curl_setopt ($curl_session, CURLOPT_HTTPGET, true); curl_setopt($curl_session, CURLOPT_URL, $url); $string = mysql_real_escape_string(curl_exec($curl_session)); $html = mysql_real_escape_string(curl_exec ($ch)); $info = curl_getinfo($ch); /*curl response check and to resolve url to the actual location*/ $response = curl_getinfo( $ch ); if ($response['http_code'] == 301 || $response['http_code'] == 302) { ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"); $headers = get_headers($response['url']); $location = ""; foreach( $headers as $value ) { if ( substr( strtolower($value), 0, 9 ) == "location:" ) return get_final_url( trim( substr( $value, 9, strlen($value) ) ) ); } } if ( preg_match("/window\.location\.replace\('(.*)'\)/i", $content, $value) || preg_match("/window\.location\=[\"'](.*)[\"']/i", $content, $value) || preg_match("/location\.href\=[\"'](.*)[\"']/i", $content, $value) ) { $finalurl = get_final_url($value[1]); } else { $finalurl = $response['url']; } $html = curl_exec($ch); $header = "Location: "; /*parse the url into the main domain name*/ function get_main_domain($temp_main_domain) { $domain_parts = explode('/', $temp_main_domain); if ($domain_parts[0]=='http:' || $domain_parts[0]=='https:') { $temp_main_domain= $domain_parts[2]; } else { $temp_main_domain= $domain_parts[0]; } unset($domain_parts); $domain_parts = explode('.', $temp_main_domain); $positions=count($domain_parts); $positions-=3; if (strlen($domain_parts[($positions+2)])==2) { $final_main_url=$domain_parts[$positions].'.'.$domain_parts[($positions+1)].'.'.$domain_parts[($positions+2)]; } else if (strlen($domain_parts[($positions+2)])==0) { $final_main_url=$domain_parts[($positions)].'.'.$domain_parts[($positions+1)]; } else { $final_main_url=$domain_parts[($positions+1)].'.'.$domain_parts[($positions+2)]; } return $final_main_url; } $final_main_parsed_host = get_main_domain($finalurl); $final_main_parsed_host = strtolower($final_main_parsed_host); echo "Main Parsed Host: $final_main_parsed_host<br />"; /*because stupid people resolve their sites to all uppercase - i have to check it and attempt to fix it*/ $checknew_parse_url = $finalurl; function checkgetparsedHost($checknew_parse_url) { $checkparsedUrl = parse_url(trim($checknew_parse_url)); return trim($checkparsedUrl[host] ? $checkparsedUrl[host] : array_shift(explode('/', $checkparsedUrl[path], 2))); } $checkget_parse_url = parse_url($checknew_parse_url, PHP_URL_HOST); $checkhost_parse_url .= str_replace(array('Www.','WWW.'), 'www.', $checkget_parse_url); $checkhost_parse_url = strtolower($checkhost_parse_url); $checkport_parse_url = parse_url($checknew_parse_url, PHP_URL_PORT); $checkuser_parse_url = parse_url($checknew_parse_url, PHP_URL_USER); $checkpass_parse_url = parse_url($checknew_parse_url, PHP_URL_PASS); $checkget_path_parse_url = parse_url($checknew_parse_url, PHP_URL_PATH); $checkpath_parse_url .= str_replace(array('Www.','WWW.'), 'www.', $checkget_path_parse_url); $checkquery_parse_url = parse_url($checknew_parse_url, PHP_URL_QUERY); $checkquery_parse_url = "?$checkquery_parse_url"; $checkquery_parse_url = rtrim($checkquery_parse_url, '#'); $checkfragment_parse_url = parse_url($checknew_parse_url, PHP_URL_FRAGMENT); $checkfragment_parse_url = "#$checkfragment_parse_url"; $checkhostpath_url = "$checkhost_parse_url$checkpath_parse_url"; $checkhostpath_url = rtrim($checkhostpath_url, '?'); $checkquery_parse_url = rtrim($checkquery_parse_url, '?'); $checkhostpathquery_url = "$checkhost_parse_url$checkpath_parse_url$checkquery_parse_url"; $checkcomplete_url = "$checkhost_parse_url$checkport_parse_url$checkuser_parse_url$checkpass_parse_url$checkpath_parse_url$checkquery_parse_url$checkfragment_parse_url"; $checkcomplete_url = rtrim($checkcomplete_url, '#'); $url = "http://$checkcomplete_url"; echo "Resolved: $finalurl"; echo ""; echo "<br />"; echo "Lowercased: $url"; echo ""; $md5_url = md5($url); print "<br />"; echo ""; /*if was a curl error - job ends and back to url insert area*/ if (!$html) { ?> <br /><FONT COLOR=red>No url inserted:</b></FONT> <br /><B><FONT COLOR=orange>Please try another url, that website may not exist. The url may or may not require the www.</b></FONT><br /> <?php exit; } if (curl_errno($ch)) { ?> <B><FONT COLOR=orange> <?php curl_error($ch); ?> </b></FONT><br /> <?php } else { ?> <br /> <?php $errmsg = curl_error($ch); curl_close($ch); $valid = array(200, 201, 202, 203, 204, 205, 206, 207, 300, 301, 302, 303, 304, 305, 306, 307); if (in_array($info['http_code'], $valid)) { ?> <B><FONT COLOR=lime>Connection OK</b></FONT> <?php } $invalid = array(400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 500, 501, 502, 503, 504, 505, 506, 507, 510); if (in_array($info['http_code'], $invalid)) { ?> <B><FONT COLOR=red>Connection Error</b></FONT> <?php } ?> <br /> <?php $redirected = array(300, 301, 302, 303, 307); if (in_array($info['http_code'], $redirected)) { ?> <B><FONT COLOR=orange>Redirection</b></FONT> <?php } $redirectedno = array(200, 201, 202, 203, 204, 205, 206, 207); if (in_array($info['http_code'], $redirectedno)) { echo "<FONT COLOR=lime> Direct Connection</b></FONT><br />"; echo $html; } print <<<END <br /> END; } ?> Very Sweet! Yes this code works. Funny thing is, someone already changed my password. Now sure who. LOL BTW, your Old car is really Quick with PHP codes.. Cheers Natasha T Quote Link to comment https://forums.phpfreaks.com/topic/220628-how-to-login-to-a-site-with-php/#findComment-1142941 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.