Jump to content

Recommended Posts

Folks,

 

I need to do few scrapping from a Site, problem is, i have to be logged in first to that site to access any content.

 

Link:

 

 

Login Details are:

 

Email: wow@mailinator.com

Password: natashaworld

 

What PHP code do i need to use to login to this site, so i can continue running my other Codes to scrape few data???

 

Cheers

Natasha T

Link to comment
https://forums.phpfreaks.com/topic/220628-how-to-login-to-a-site-with-php/
Share on other sites

Folks,

 

I need to do few scrapping from a Site, problem is, i have to be logged in first to that site to access any content.

 

Link:

 

 

Login Details are:

 

Email: wow@mailinator.com

Password: natashaworld

 

What PHP code do i need to use to login to this site, so i can continue running my other Codes to scrape few data???

 

Cheers

Natasha T

 

 

Will it be Something like this...

 

<?php

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "https://www.majesticseo.com/account/login?redirect=%2Faccount%2Flogin");

curl_setopt ($ch, CURLOPT_POST, 1);

curl_setopt ($ch, CURLOPT_POSTFIELDS, "LoginEmail=wow@mailinator.com&LoginPassword=natashaworld");

curl_setopt ($ch, CURLOPT_COOKIEJAR, "cookie.txt");

curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);

$login = curl_exec ($ch);

curl_setopt($ch, CURLOPT_URL, "https://www.majesticseo.com/account/login?redirect=%2Faccount%2Flogin");

$members = curl_exec ($ch);

curl_close ($ch);

?>

 

1- Is this code correct?

 

 

2- How can i confirm that i am logged in? Coz this Code will run on my server and not on my browser....

 

Cheers

Am I incorrect that you do not need to login to the backend of there, aren't you supposed to upload some sort of site, then from there it would be your own login credentials from w/e platform or scripts you will then have.

 

I did try for the heck of it to use curl and login your information, it didn't work.

 

I then modified the code with https and my own scraper, what seems to happen is the site always wants to redirect back to their main page for a check, not sure how to go about that through curl.

The below code works , I can read Welcome natasha thomas.

 

You should now edit your login password on your site before someone does harm to your account.

 

Here is a link to the working code

http://get.blogdns.com/dynaindex/testscrape

 

I just echoed the html.

You will see that the href links go to my own site, thats because that site did self versus the full http link. Would have to fix those with maybe dom, I had to do that for my page parser.

 

<?php

$url = "https://www.majesticseo.com/account/login?redirect=%2Faccount";


        /*connect to the url using curl to see if exists and get the information*/
        $cookie = tempnam('tmp','cookie');
        $cookie_file_path = "tmp/";
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
        curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3');
        curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
        curl_setopt ($ch, CURLOPT_POST, 2);
        curl_setopt ($ch, CURLOPT_POSTFIELDS, "LoginEmail=wow@mailinator.com&LoginPassword=natashaworld");
        curl_setopt($ch, CURLOPT_TIMEOUT, 15);
        curl_setopt($ch, CURLOPT_MAXREDIRS, 15);
        curl_setopt($ch, CURLOPT_HEADER, 1);
        curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt ($ch, CURLOPT_FILETIME, 1);
        curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
        curl_setopt($ch, CURLOPT_ENCODING , "");
        $curl_session = curl_init();
        curl_setopt($curl_session, CURLOPT_COOKIEJAR, $cookie);
        curl_setopt($curl_session, CURLOPT_COOKIEFILE, $cookie_file_path);
        curl_setopt($curl_session, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3');
        curl_setopt($curl_session, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
        curl_setopt ($curl_session, CURLOPT_POST, 2);
        curl_setopt ($curl_session, CURLOPT_POSTFIELDS, "LoginEmail=wow@mailinator.com&LoginPassword=natashaworld");
        curl_setopt($curl_session, CURLOPT_ENCODING , "");
        curl_setopt($curl_session, CURLOPT_TIMEOUT, 15);
        curl_setopt($curl_session, CURLOPT_HEADER, 1);
        curl_setopt($curl_session, CURLOPT_SSL_VERIFYPEER, FALSE);
        curl_setopt($curl_session, CURLOPT_HEADER, true);
        curl_setopt($curl_session, CURLOPT_MAXREDIRS, 15);
        curl_setopt($curl_session, CURLOPT_RETURNTRANSFER, true);
        curl_setopt( $curl_session, CURLOPT_AUTOREFERER, true );
        curl_setopt ($curl_session, CURLOPT_HTTPGET, true);
        curl_setopt($curl_session, CURLOPT_URL, $url);
        $string = mysql_real_escape_string(curl_exec($curl_session));
        $html = mysql_real_escape_string(curl_exec ($ch));
        $info = curl_getinfo($ch);
        /*curl response check and to resolve url to the actual location*/
        $response = curl_getinfo( $ch );
        if ($response['http_code'] == 301 || $response['http_code'] == 302) {
            ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3");
            $headers = get_headers($response['url']);

            $location = "";
            foreach( $headers as $value ) {
                if ( substr( strtolower($value), 0, 9 ) == "location:" )
                    return get_final_url( trim( substr( $value, 9, strlen($value) ) ) );
            }
        }

        if (    preg_match("/window\.location\.replace\('(.*)'\)/i", $content, $value) ||
        preg_match("/window\.location\=[\"'](.*)[\"']/i", $content, $value) ||
        preg_match("/location\.href\=[\"'](.*)[\"']/i", $content, $value)
)
       {
            $finalurl = get_final_url($value[1]);
        }
        else {

            $finalurl = $response['url'];
        }
        
        $html = curl_exec($ch);
        $header = "Location: ";

        /*parse the url into the main domain name*/
        function get_main_domain($temp_main_domain) {
            $domain_parts = explode('/', $temp_main_domain);
            if ($domain_parts[0]=='http:' || $domain_parts[0]=='https:') {
                $temp_main_domain= $domain_parts[2];
            } else {
                $temp_main_domain= $domain_parts[0];
            }
            unset($domain_parts);
            $domain_parts = explode('.', $temp_main_domain);
            $positions=count($domain_parts);
            $positions-=3;
            if (strlen($domain_parts[($positions+2)])==2) {
                $final_main_url=$domain_parts[$positions].'.'.$domain_parts[($positions+1)].'.'.$domain_parts[($positions+2)];
            } else if (strlen($domain_parts[($positions+2)])==0) {
                $final_main_url=$domain_parts[($positions)].'.'.$domain_parts[($positions+1)];
            } else {
                $final_main_url=$domain_parts[($positions+1)].'.'.$domain_parts[($positions+2)];
            }
            return $final_main_url;
        }
        $final_main_parsed_host = get_main_domain($finalurl);
        $final_main_parsed_host = strtolower($final_main_parsed_host);
        echo "Main Parsed Host: $final_main_parsed_host<br />";


        /*because stupid people resolve their sites to all uppercase - i have to check it and attempt to fix it*/
        $checknew_parse_url = $finalurl;
        function checkgetparsedHost($checknew_parse_url) {
            $checkparsedUrl = parse_url(trim($checknew_parse_url));
            return trim($checkparsedUrl[host] ? $checkparsedUrl[host] : array_shift(explode('/', $checkparsedUrl[path], 2)));
        }
        $checkget_parse_url = parse_url($checknew_parse_url, PHP_URL_HOST);
        $checkhost_parse_url .= str_replace(array('Www.','WWW.'), 'www.', $checkget_parse_url);
        $checkhost_parse_url = strtolower($checkhost_parse_url);
        $checkport_parse_url = parse_url($checknew_parse_url, PHP_URL_PORT);
        $checkuser_parse_url = parse_url($checknew_parse_url, PHP_URL_USER);
        $checkpass_parse_url = parse_url($checknew_parse_url, PHP_URL_PASS);
        $checkget_path_parse_url = parse_url($checknew_parse_url, PHP_URL_PATH);
        $checkpath_parse_url .= str_replace(array('Www.','WWW.'), 'www.', $checkget_path_parse_url);
        $checkquery_parse_url = parse_url($checknew_parse_url, PHP_URL_QUERY);
        $checkquery_parse_url = "?$checkquery_parse_url";
        $checkquery_parse_url = rtrim($checkquery_parse_url, '#');
        $checkfragment_parse_url = parse_url($checknew_parse_url, PHP_URL_FRAGMENT);
        $checkfragment_parse_url = "#$checkfragment_parse_url";
        $checkhostpath_url = "$checkhost_parse_url$checkpath_parse_url";
        $checkhostpath_url = rtrim($checkhostpath_url, '?');
        $checkquery_parse_url = rtrim($checkquery_parse_url, '?');

        $checkhostpathquery_url = "$checkhost_parse_url$checkpath_parse_url$checkquery_parse_url";

        $checkcomplete_url = "$checkhost_parse_url$checkport_parse_url$checkuser_parse_url$checkpass_parse_url$checkpath_parse_url$checkquery_parse_url$checkfragment_parse_url";
        $checkcomplete_url = rtrim($checkcomplete_url, '#');
        $url = "http://$checkcomplete_url";
        echo "Resolved: $finalurl";
        echo "";
        echo "<br />";
        echo "Lowercased: $url";
        echo "";
        $md5_url = md5($url);
        print "<br />";
        echo "";
       
        /*if was a curl error - job ends and back to url insert area*/
        if (!$html) {
        ?>

           <br /><FONT COLOR=red>No url inserted:</b></FONT>
           <br /><B><FONT COLOR=orange>Please try another url, that website may not exist. The url may or may not require the www.</b></FONT><br />
           <?php
            exit;
        }
        if (curl_errno($ch)) {
        ?>
            <B><FONT COLOR=orange> <?php curl_error($ch); ?> </b></FONT><br />
            <?php
        }
        else {
        ?>
            <br />
             <?php
            $errmsg  = curl_error($ch);
            curl_close($ch);
            $valid = array(200, 201, 202, 203, 204, 205, 206, 207, 300, 301, 302, 303, 304, 305, 306, 307);
            if (in_array($info['http_code'], $valid)) {
            ?>
                <B><FONT COLOR=lime>Connection OK</b></FONT>
                <?php
            }
            $invalid = array(400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 500, 501, 502, 503, 504, 505, 506, 507, 510);
            if (in_array($info['http_code'], $invalid)) {
            ?>
               <B><FONT COLOR=red>Connection Error</b></FONT>
               <?php
            }
            ?>
           <br />
           <?php
            $redirected = array(300, 301, 302, 303, 307);
            if (in_array($info['http_code'], $redirected)) {
            ?>
                <B><FONT COLOR=orange>Redirection</b></FONT>
                <?php
            }
            $redirectedno = array(200, 201, 202, 203, 204, 205, 206, 207);
            if (in_array($info['http_code'], $redirectedno)) {
            
               echo "<FONT COLOR=lime> Direct Connection</b></FONT><br />";
               
               echo $html;
               
            }

            print <<<END
            <br />
END;
}

?>  

The below code works , I can read Welcome natasha thomas.

 

You should now edit your login password on your site before someone does harm to your account.

 

Here is a link to the working code

http://get.blogdns.com/dynaindex/testscrape

 

I just echoed the html.

You will see that the href links go to my own site, thats because that site did self versus the full http link. Would have to fix those with maybe dom, I had to do that for my page parser.

 

<?php

$url = "https://www.majesticseo.com/account/login?redirect=%2Faccount";


        /*connect to the url using curl to see if exists and get the information*/
        $cookie = tempnam('tmp','cookie');
        $cookie_file_path = "tmp/";
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
        curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3');
        curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
        curl_setopt ($ch, CURLOPT_POST, 2);
        curl_setopt ($ch, CURLOPT_POSTFIELDS, "LoginEmail=wow@mailinator.com&LoginPassword=natashaworld");
        curl_setopt($ch, CURLOPT_TIMEOUT, 15);
        curl_setopt($ch, CURLOPT_MAXREDIRS, 15);
        curl_setopt($ch, CURLOPT_HEADER, 1);
        curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt ($ch, CURLOPT_FILETIME, 1);
        curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
        curl_setopt($ch, CURLOPT_ENCODING , "");
        $curl_session = curl_init();
        curl_setopt($curl_session, CURLOPT_COOKIEJAR, $cookie);
        curl_setopt($curl_session, CURLOPT_COOKIEFILE, $cookie_file_path);
        curl_setopt($curl_session, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3');
        curl_setopt($curl_session, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
        curl_setopt ($curl_session, CURLOPT_POST, 2);
        curl_setopt ($curl_session, CURLOPT_POSTFIELDS, "LoginEmail=wow@mailinator.com&LoginPassword=natashaworld");
        curl_setopt($curl_session, CURLOPT_ENCODING , "");
        curl_setopt($curl_session, CURLOPT_TIMEOUT, 15);
        curl_setopt($curl_session, CURLOPT_HEADER, 1);
        curl_setopt($curl_session, CURLOPT_SSL_VERIFYPEER, FALSE);
        curl_setopt($curl_session, CURLOPT_HEADER, true);
        curl_setopt($curl_session, CURLOPT_MAXREDIRS, 15);
        curl_setopt($curl_session, CURLOPT_RETURNTRANSFER, true);
        curl_setopt( $curl_session, CURLOPT_AUTOREFERER, true );
        curl_setopt ($curl_session, CURLOPT_HTTPGET, true);
        curl_setopt($curl_session, CURLOPT_URL, $url);
        $string = mysql_real_escape_string(curl_exec($curl_session));
        $html = mysql_real_escape_string(curl_exec ($ch));
        $info = curl_getinfo($ch);
        /*curl response check and to resolve url to the actual location*/
        $response = curl_getinfo( $ch );
        if ($response['http_code'] == 301 || $response['http_code'] == 302) {
            ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3");
            $headers = get_headers($response['url']);

            $location = "";
            foreach( $headers as $value ) {
                if ( substr( strtolower($value), 0, 9 ) == "location:" )
                    return get_final_url( trim( substr( $value, 9, strlen($value) ) ) );
            }
        }

        if (    preg_match("/window\.location\.replace\('(.*)'\)/i", $content, $value) ||
        preg_match("/window\.location\=[\"'](.*)[\"']/i", $content, $value) ||
        preg_match("/location\.href\=[\"'](.*)[\"']/i", $content, $value)
)
       {
            $finalurl = get_final_url($value[1]);
        }
        else {

            $finalurl = $response['url'];
        }
        
        $html = curl_exec($ch);
        $header = "Location: ";

        /*parse the url into the main domain name*/
        function get_main_domain($temp_main_domain) {
            $domain_parts = explode('/', $temp_main_domain);
            if ($domain_parts[0]=='http:' || $domain_parts[0]=='https:') {
                $temp_main_domain= $domain_parts[2];
            } else {
                $temp_main_domain= $domain_parts[0];
            }
            unset($domain_parts);
            $domain_parts = explode('.', $temp_main_domain);
            $positions=count($domain_parts);
            $positions-=3;
            if (strlen($domain_parts[($positions+2)])==2) {
                $final_main_url=$domain_parts[$positions].'.'.$domain_parts[($positions+1)].'.'.$domain_parts[($positions+2)];
            } else if (strlen($domain_parts[($positions+2)])==0) {
                $final_main_url=$domain_parts[($positions)].'.'.$domain_parts[($positions+1)];
            } else {
                $final_main_url=$domain_parts[($positions+1)].'.'.$domain_parts[($positions+2)];
            }
            return $final_main_url;
        }
        $final_main_parsed_host = get_main_domain($finalurl);
        $final_main_parsed_host = strtolower($final_main_parsed_host);
        echo "Main Parsed Host: $final_main_parsed_host<br />";


        /*because stupid people resolve their sites to all uppercase - i have to check it and attempt to fix it*/
        $checknew_parse_url = $finalurl;
        function checkgetparsedHost($checknew_parse_url) {
            $checkparsedUrl = parse_url(trim($checknew_parse_url));
            return trim($checkparsedUrl[host] ? $checkparsedUrl[host] : array_shift(explode('/', $checkparsedUrl[path], 2)));
        }
        $checkget_parse_url = parse_url($checknew_parse_url, PHP_URL_HOST);
        $checkhost_parse_url .= str_replace(array('Www.','WWW.'), 'www.', $checkget_parse_url);
        $checkhost_parse_url = strtolower($checkhost_parse_url);
        $checkport_parse_url = parse_url($checknew_parse_url, PHP_URL_PORT);
        $checkuser_parse_url = parse_url($checknew_parse_url, PHP_URL_USER);
        $checkpass_parse_url = parse_url($checknew_parse_url, PHP_URL_PASS);
        $checkget_path_parse_url = parse_url($checknew_parse_url, PHP_URL_PATH);
        $checkpath_parse_url .= str_replace(array('Www.','WWW.'), 'www.', $checkget_path_parse_url);
        $checkquery_parse_url = parse_url($checknew_parse_url, PHP_URL_QUERY);
        $checkquery_parse_url = "?$checkquery_parse_url";
        $checkquery_parse_url = rtrim($checkquery_parse_url, '#');
        $checkfragment_parse_url = parse_url($checknew_parse_url, PHP_URL_FRAGMENT);
        $checkfragment_parse_url = "#$checkfragment_parse_url";
        $checkhostpath_url = "$checkhost_parse_url$checkpath_parse_url";
        $checkhostpath_url = rtrim($checkhostpath_url, '?');
        $checkquery_parse_url = rtrim($checkquery_parse_url, '?');

        $checkhostpathquery_url = "$checkhost_parse_url$checkpath_parse_url$checkquery_parse_url";

        $checkcomplete_url = "$checkhost_parse_url$checkport_parse_url$checkuser_parse_url$checkpass_parse_url$checkpath_parse_url$checkquery_parse_url$checkfragment_parse_url";
        $checkcomplete_url = rtrim($checkcomplete_url, '#');
        $url = "http://$checkcomplete_url";
        echo "Resolved: $finalurl";
        echo "";
        echo "<br />";
        echo "Lowercased: $url";
        echo "";
        $md5_url = md5($url);
        print "<br />";
        echo "";
       
        /*if was a curl error - job ends and back to url insert area*/
        if (!$html) {
        ?>

           <br /><FONT COLOR=red>No url inserted:</b></FONT>
           <br /><B><FONT COLOR=orange>Please try another url, that website may not exist. The url may or may not require the www.</b></FONT><br />
           <?php
            exit;
        }
        if (curl_errno($ch)) {
        ?>
            <B><FONT COLOR=orange> <?php curl_error($ch); ?> </b></FONT><br />
            <?php
        }
        else {
        ?>
            <br />
             <?php
            $errmsg  = curl_error($ch);
            curl_close($ch);
            $valid = array(200, 201, 202, 203, 204, 205, 206, 207, 300, 301, 302, 303, 304, 305, 306, 307);
            if (in_array($info['http_code'], $valid)) {
            ?>
                <B><FONT COLOR=lime>Connection OK</b></FONT>
                <?php
            }
            $invalid = array(400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 500, 501, 502, 503, 504, 505, 506, 507, 510);
            if (in_array($info['http_code'], $invalid)) {
            ?>
               <B><FONT COLOR=red>Connection Error</b></FONT>
               <?php
            }
            ?>
           <br />
           <?php
            $redirected = array(300, 301, 302, 303, 307);
            if (in_array($info['http_code'], $redirected)) {
            ?>
                <B><FONT COLOR=orange>Redirection</b></FONT>
                <?php
            }
            $redirectedno = array(200, 201, 202, 203, 204, 205, 206, 207);
            if (in_array($info['http_code'], $redirectedno)) {
            
               echo "<FONT COLOR=lime> Direct Connection</b></FONT><br />";
               
               echo $html;
               
            }

            print <<<END
            <br />
END;
}

?>  

 

 

Very Sweet!

 

Yes this code works.

 

Funny thing is, someone already changed my password. Now sure who.  LOL

 

BTW, your Old car is really Quick with PHP codes..  ;D

 

Cheers

Natasha T

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.