jaymc Posted July 1, 2008 Share Posted July 1, 2008 Is this the best way to detect a search engine crawling my website <?php $botlist = array( "Teoma", "alexa", "froogle", "inktomi", "looksmart", "URL_Spider_SQL", "Firefly", "NationalDirectory", "Ask Jeeves", "TECNOSEEK", "InfoSeek", "WebFindBot", "girafabot", "crawler", "www.galaxy.com", "Googlebot", "Scooter", "Slurp", "appie", "FAST", "WebBug", "Spade", "ZyBorg", "rabaz"); foreach($botlist as $bot) { if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) { if($bot == "Googlebot") { if (substr($REMOTE_HOST, 0, 11) == "216.239.46.") $bot = "Googlebot Deep Crawl"; elseif (substr($REMOTE_HOST, 0,7) == "64.68.8") $bot = "Google Freshbot"; } if ($QUERY_STRING != "") { $url = "http://" . $SERVER_NAME . $PHP_SELF . "?" . $QUERY_STRING . ""; } else { $url = "http://" . $SERVER_NAME . $PHP_SELF . ""; } // settings $to = "email@your-domain.com"; $subject = "Detected: $bot on $url"; $body = "$bot was deteched on $url\n\n Date.............: " . date("F j, Y, g:i a") . " Page.............: " . $url . " Robot Name.......: " . $_SERVER['HTTP_USER_AGENT'] . " Robot Address....: " . $REMOTE_ADDR . " Robot Host.......: " . $REMOTE_HOST . " "; mail($to, $subject, $body); } } ?> Quote Link to comment https://forums.phpfreaks.com/topic/112840-solved-detect-search-engines/ Share on other sites More sharing options...
donbueck Posted July 2, 2008 Share Posted July 2, 2008 I wouldn't have it email you every time a bot hits a page, it could be a lot of emails. Also, you might want to use $_SERVER['REMOTE_ADDR'] to get the IP address. Also, check out the get_browser() function, might have some tidbits for you. Could you get away with using the Apache logs to track bots after they visit? Quote Link to comment https://forums.phpfreaks.com/topic/112840-solved-detect-search-engines/#findComment-579999 Share on other sites More sharing options...
jaymc Posted July 2, 2008 Author Share Posted July 2, 2008 Yeh ignore the email bit, thats just for testing purposes I think the main point to this question is can I rely on the browser headers sent and will google bot always have Googlebot in there somewhere I dont want it to half work, as that will in essance totally lock google/search engines out of my website Can I rely on this method? If not, is there a better solution.. Quote Link to comment https://forums.phpfreaks.com/topic/112840-solved-detect-search-engines/#findComment-580428 Share on other sites More sharing options...
donbueck Posted July 3, 2008 Share Posted July 3, 2008 Yes, to the best of my knowledge Google will always show up in the headers as Googlebot. Another problem to consider is that spammers can spoof those headers. See this Google blog entry. Quote Link to comment https://forums.phpfreaks.com/topic/112840-solved-detect-search-engines/#findComment-580555 Share on other sites More sharing options...
jaymc Posted July 3, 2008 Author Share Posted July 3, 2008 Cheers Quote Link to comment https://forums.phpfreaks.com/topic/112840-solved-detect-search-engines/#findComment-580786 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.