Jump to content

advanced way to detect search bots


hollowdra

Recommended Posts

hi guys i was searching for and advanced way to detect search bots and i found an old post(2008) in forum and i think its good

but i need an experts opinion this is the code

 

function is_this_a_real_msnbot($remote_host_ip) {
// http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx
// http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS
$the_host_should_be="livebot-";
$the_host_should_be.=str_replace(".", "-", $remote_host_ip);
$the_host_should_be.=".search.live.com";
if ($the_host_should_be==gethostbyaddr($remote_host_ip)) { //If reverse DNS lookup looks good then proceed to
foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS
if ($realip==$remote_host_ip) {return TRUE;}
}
} else {return FALSE;}
}
function is_this_a_real_YahooSlurp($remote_host_ip) {
// http://www.seroundtable.com/archives/013781.html
// http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS
$the_host_should_be=".crawl.yahoo.net";
if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -16)) { //If reverse DNS lookup looks good then proceed to
foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS
if ($realip==$remote_host_ip) {return TRUE;}
}
} else {return FALSE;}
}
function is_this_a_real_GoogleBot($remote_host_ip) {
// http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html
// http://en.wikipedia.org/wiki/Forward_Confirmed_reverse_DNS
$the_host_should_be=".googlebot.com";
if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -14)) { //If reverse DNS lookup looks good then proceed to
foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS
if ($realip==$remote_host_ip) {return TRUE;}
}
} else {return FALSE;}
}
function is_this_a_real_Alexa_ia_archiver($remote_host_ip) {
$the_host_should_be=".alexa.com";
if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -10)) { //If reverse DNS lookup looks good then proceed to
foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS
if ($realip==$remote_host_ip) {return TRUE;}
}
} else {return FALSE;}
}
function is_this_a_real_ArchiveORG_ia_archiver($remote_host_ip) {
$the_host_should_be=".archive.org";
if ($the_host_should_be==substr(gethostbyaddr($remote_host_ip), -12)) { //If reverse DNS lookup looks good then proceed to
foreach (gethostbynamel(gethostbyaddr($remote_host_ip)) as $realip) { ///Forward Confirmed reverse DNS
if ($realip==$remote_host_ip) {return TRUE;}
}
} else {return FALSE;}
}

function is_this_a_valid_web_crawler($remote_host_ip) { //This function should return TRUE as soon as possible since it's testing to see if an IP address belongs to a vaild web crawler.
if (is_this_a_real_msnbot($remote_host_ip)) {return TRUE;}
elseif (is_this_a_real_GoogleBot($remote_host_ip)) {return TRUE;}
elseif (is_this_a_real_Alexa_ia_archiver($remote_host_ip)) {return TRUE;}
elseif (is_this_a_real_ArchiveORG_ia_archiver($remote_host_ip)) {return TRUE;}
else {return FALSE;}
}

 

so what do you think is it good to use on a  commercial website???????

Link to comment
Share on other sites

Google around a bit, take a look at:

 

http://stackoverflow.com/questions/677419/how-to-detect-search-engine-bots-with-php

http://ditio.net/2008/09/07/detecting-search-engine-bots-with-php/

http://www.insanevisions.com/article/214/Tutorials/Bot-Detection-with-PHP/

http://sandaldjepit.com/2009/detect-search-engine-robot-name/

 

Plenty of information on building your own script to check for bots, or you could refine a Google search for pre-made scripts that do what you want. I'm sure there would be a decent script out there.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.