rubing Posted September 2, 2008 Share Posted September 2, 2008 I am trying to build a simple ad server in php. How do I exclude looging the activity of any spiders, bots, etc... Is there an easy way to exclude all of them using the SERVER variables or something of that nature?? thx! Quote Link to comment https://forums.phpfreaks.com/topic/122379-exclude-spiders-and-bots-from-logs/ Share on other sites More sharing options...
JonnoTheDev Posted September 2, 2008 Share Posted September 2, 2008 Are you talking about you server access logs or logs that you are creating? Do you want spiders blocked from your site? If so you can exclude them in a robots.txt file Quote Link to comment https://forums.phpfreaks.com/topic/122379-exclude-spiders-and-bots-from-logs/#findComment-631990 Share on other sites More sharing options...
cooldude832 Posted September 2, 2008 Share Posted September 2, 2008 I believe google bot passes something along you can find in the $_SERVER variable but u will need to google it to figure it out Quote Link to comment https://forums.phpfreaks.com/topic/122379-exclude-spiders-and-bots-from-logs/#findComment-631992 Share on other sites More sharing options...
JonnoTheDev Posted September 2, 2008 Share Posted September 2, 2008 No, you can get the agent - but be warned it can easily be spoofed! getenv("HTTP_USER_AGENT"); I use the following to stop SE spiders from starting sessions. function isSpider($userAgent) { if(stristr($userAgent, "Googlebot") || /* Google */ stristr($userAgent, "Slurp") || /* Inktomi/Y! */ stristr($userAgent, "MSNBOT") || /* MSN */ stristr($userAgent, "teoma") || /* Teoma */ stristr($userAgent, "ia_archiver") || /* Alexa */ stristr($userAgent, "Scooter") || /* Altavista */ stristr($userAgent, "Mercator") || /* Altavista */ stristr($userAgent, "FAST") || /* AllTheWeb */ stristr($userAgent, "MantraAgent") || /* LookSmart */ stristr($userAgent, "Lycos") || /* Lycos */ stristr($userAgent, "ZyBorg")) { /* WISEnut */ return true; } return false; } if(isSpider(getenv("HTTP_USER_AGENT"))) { // redirect bots } The agent will still be in the server access logs however Quote Link to comment https://forums.phpfreaks.com/topic/122379-exclude-spiders-and-bots-from-logs/#findComment-631998 Share on other sites More sharing options...
rubing Posted September 2, 2008 Author Share Posted September 2, 2008 I guess I was a little bit concerned b/c I have often read that crawlers, spiders, bots, etc... which ignore your rules....sorry should've been more specific. Last year I was browsing a book about webbots and the mentioned that some sites employ advanced methods for detecting them. I guess its silly though to worry about such a potentially small source of traffic! I like the isSpider function! Quote Link to comment https://forums.phpfreaks.com/topic/122379-exclude-spiders-and-bots-from-logs/#findComment-632321 Share on other sites More sharing options...
natbob Posted September 2, 2008 Share Posted September 2, 2008 if you don't want anyone to read the logs you can block them in your apache config if they are malicious web bots which look at the robots.txt file and then go to all of the listed places, you can make a trap by adding an entry which will lead them to a page where thier user agent is recorded and blocked from the site completely Just a note it might be a good idea to add some of the popular user agents (FF, IE etc.) to an exclusion so that a curious visitor doesn't end up blocking all of the visitors using thier browser. Quote Link to comment https://forums.phpfreaks.com/topic/122379-exclude-spiders-and-bots-from-logs/#findComment-632332 Share on other sites More sharing options...
rubing Posted September 4, 2008 Author Share Posted September 4, 2008 SWEET!!! Quote Link to comment https://forums.phpfreaks.com/topic/122379-exclude-spiders-and-bots-from-logs/#findComment-633676 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.