hollowdra Posted August 8, 2010 Share Posted August 8, 2010 hi guys i was searching for and advanced way to detect search bots and i found an old post(2008) in forum and i think its good but i need an experts opinion this is the code function is_this_a_real_msnbot($remote_host_ip) { // http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx // http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS $the_host_should_be="livebot-"; $the_host_should_be.=str_replace(".", "-", $remote_host_ip); $the_host_should_be.=".search.live.com"; if ($the_host_should_be==gethostbyaddr($remote_host_ip)) { //If reverse DNS lookup looks good then proceed to foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS if ($realip==$remote_host_ip) {return TRUE;} } } else {return FALSE;} } function is_this_a_real_YahooSlurp($remote_host_ip) { // http://www.seroundtable.com/archives/013781.html // http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS $the_host_should_be=".crawl.yahoo.net"; if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -16)) { //If reverse DNS lookup looks good then proceed to foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS if ($realip==$remote_host_ip) {return TRUE;} } } else {return FALSE;} } function is_this_a_real_GoogleBot($remote_host_ip) { // http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html // http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS $the_host_should_be=".googlebot.com"; if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -14)) { //If reverse DNS lookup looks good then proceed to foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS if ($realip==$remote_host_ip) {return TRUE;} } } else {return FALSE;} } function is_this_a_real_Alexa_ia_archiver($remote_host_ip) { $the_host_should_be=".alexa.com"; if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -10)) { //If reverse DNS lookup looks good then proceed to foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS if ($realip==$remote_host_ip) {return TRUE;} } } else {return FALSE;} } function is_this_a_real_ArchiveORG_ia_archiver($remote_host_ip) { $the_host_should_be=".archive.org"; if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -12)) { //If reverse DNS lookup looks good then proceed to foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS if ($realip==$remote_host_ip) {return TRUE;} } } else {return FALSE;} } function is_this_a_valid_web_crawler($remote_host_ip) { //This function should return TRUE as soon as possible since it's testing to see if an IP address belongs to a vaild web crawler. if (is_this_a_real_msnbot($remote_host_ip)) {return TRUE;} elseif (is_this_a_real_GoogleBot($remote_host_ip)) {return TRUE;} elseif (is_this_a_real_Alexa_ia_archiver($remote_host_ip)) {return TRUE;} elseif (is_this_a_real_ArchiveORG_ia_archiver($remote_host_ip)) {return TRUE;} else {return FALSE;} } so what do you think is it good to use on a commercial website??????? Quote Link to comment Share on other sites More sharing options...
JasonLewis Posted August 9, 2010 Share Posted August 9, 2010 Google around a bit, take a look at: http://stackoverflow.com/questions/677419/how-to-detect-search-engine-bots-with-php http://ditio.net/2008/09/07/detecting-search-engine-bots-with-php/ http://www.insanevisions.com/article/214/Tutorials/Bot-Detection-with-PHP/ http://sandaldjepit.com/2009/detect-search-engine-robot-name/ Plenty of information on building your own script to check for bots, or you could refine a Google search for pre-made scripts that do what you want. I'm sure there would be a decent script out there. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.