Jump to content

advanced way to detect search bots


hollowdra

Recommended Posts

hi guys i was searching for and advanced way to detect search bots and i found an old post(2008) in forum and i think its good

but i need an experts opinion this is the code

 

function is_this_a_real_msnbot($remote_host_ip) {
// http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx
// http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS
$the_host_should_be="livebot-";
$the_host_should_be.=str_replace(".", "-", $remote_host_ip);
$the_host_should_be.=".search.live.com";
if ($the_host_should_be==gethostbyaddr($remote_host_ip)) { //If reverse DNS lookup looks good then proceed to
foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS
if ($realip==$remote_host_ip) {return TRUE;}
}
} else {return FALSE;}
}
function is_this_a_real_YahooSlurp($remote_host_ip) {
// http://www.seroundtable.com/archives/013781.html
// http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS
$the_host_should_be=".crawl.yahoo.net";
if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -16)) { //If reverse DNS lookup looks good then proceed to
foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS
if ($realip==$remote_host_ip) {return TRUE;}
}
} else {return FALSE;}
}
function is_this_a_real_GoogleBot($remote_host_ip) {
// http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html
// http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS
$the_host_should_be=".googlebot.com";
if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -14)) { //If reverse DNS lookup looks good then proceed to
foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS
if ($realip==$remote_host_ip) {return TRUE;}
}
} else {return FALSE;}
}
function is_this_a_real_Alexa_ia_archiver($remote_host_ip) {
$the_host_should_be=".alexa.com";
if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -10)) { //If reverse DNS lookup looks good then proceed to
foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS
if ($realip==$remote_host_ip) {return TRUE;}
}
} else {return FALSE;}
}
function is_this_a_real_ArchiveORG_ia_archiver($remote_host_ip) {
$the_host_should_be=".archive.org";
if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -12)) { //If reverse DNS lookup looks good then proceed to
foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS
if ($realip==$remote_host_ip) {return TRUE;}
}
} else {return FALSE;}
}

function is_this_a_valid_web_crawler($remote_host_ip) { //This function should return TRUE as soon as possible since it's testing to see if an IP address belongs to a vaild web crawler.
if (is_this_a_real_msnbot($remote_host_ip)) {return TRUE;}
elseif (is_this_a_real_GoogleBot($remote_host_ip)) {return TRUE;}
elseif (is_this_a_real_Alexa_ia_archiver($remote_host_ip)) {return TRUE;}
elseif (is_this_a_real_ArchiveORG_ia_archiver($remote_host_ip)) {return TRUE;}
else {return FALSE;}
}

 

so what do you think is it good to use on a  commercial website???????

Link to comment
https://forums.phpfreaks.com/topic/210147-advanced-way-to-detect-search-bots/
Share on other sites

Google around a bit, take a look at:

 

http://stackoverflow.com/questions/677419/how-to-detect-search-engine-bots-with-php

http://ditio.net/2008/09/07/detecting-search-engine-bots-with-php/

http://www.insanevisions.com/article/214/Tutorials/Bot-Detection-with-PHP/

http://sandaldjepit.com/2009/detect-search-engine-robot-name/

 

Plenty of information on building your own script to check for bots, or you could refine a Google search for pre-made scripts that do what you want. I'm sure there would be a decent script out there.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.