hollowdra Posted August 8, 2010 Share Posted August 8, 2010 hi guys i was searching for and advanced way to detect search bots and i found an old post(2008) in forum and i think its good but i need an experts opinion this is the code function is_this_a_real_msnbot($remote_host_ip) { // http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx // http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS $the_host_should_be="livebot-"; $the_host_should_be.=str_replace(".", "-", $remote_host_ip); $the_host_should_be.=".search.live.com"; if ($the_host_should_be==gethostbyaddr($remote_host_ip)) { //If reverse DNS lookup looks good then proceed to foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS if ($realip==$remote_host_ip) {return TRUE;} } } else {return FALSE;} } function is_this_a_real_YahooSlurp($remote_host_ip) { // http://www.seroundtable.com/archives/013781.html // http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS $the_host_should_be=".crawl.yahoo.net"; if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -16)) { //If reverse DNS lookup looks good then proceed to foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS if ($realip==$remote_host_ip) {return TRUE;} } } else {return FALSE;} } function is_this_a_real_GoogleBot($remote_host_ip) { // http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html // http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS $the_host_should_be=".googlebot.com"; if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -14)) { //If reverse DNS lookup looks good then proceed to foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS if ($realip==$remote_host_ip) {return TRUE;} } } else {return FALSE;} } function is_this_a_real_Alexa_ia_archiver($remote_host_ip) { $the_host_should_be=".alexa.com"; if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -10)) { //If reverse DNS lookup looks good then proceed to foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS if ($realip==$remote_host_ip) {return TRUE;} } } else {return FALSE;} } function is_this_a_real_ArchiveORG_ia_archiver($remote_host_ip) { $the_host_should_be=".archive.org"; if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -12)) { //If reverse DNS lookup looks good then proceed to foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS if ($realip==$remote_host_ip) {return TRUE;} } } else {return FALSE;} } function is_this_a_valid_web_crawler($remote_host_ip) { //This function should return TRUE as soon as possible since it's testing to see if an IP address belongs to a vaild web crawler. if (is_this_a_real_msnbot($remote_host_ip)) {return TRUE;} elseif (is_this_a_real_GoogleBot($remote_host_ip)) {return TRUE;} elseif (is_this_a_real_Alexa_ia_archiver($remote_host_ip)) {return TRUE;} elseif (is_this_a_real_ArchiveORG_ia_archiver($remote_host_ip)) {return TRUE;} else {return FALSE;} } so what do you think is it good to use on a commercial website??????? Link to comment https://forums.phpfreaks.com/topic/210147-advanced-way-to-detect-search-bots/ Share on other sites More sharing options...
JasonLewis Posted August 9, 2010 Share Posted August 9, 2010 Google around a bit, take a look at: http://stackoverflow.com/questions/677419/how-to-detect-search-engine-bots-with-php http://ditio.net/2008/09/07/detecting-search-engine-bots-with-php/ http://www.insanevisions.com/article/214/Tutorials/Bot-Detection-with-PHP/ http://sandaldjepit.com/2009/detect-search-engine-robot-name/ Plenty of information on building your own script to check for bots, or you could refine a Google search for pre-made scripts that do what you want. I'm sure there would be a decent script out there. Link to comment https://forums.phpfreaks.com/topic/210147-advanced-way-to-detect-search-bots/#findComment-1096852 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.