Jump to content

exclude spiders and bots from logs


rubing

Recommended Posts

No, you can get the agent - but be warned it can easily be spoofed!

getenv("HTTP_USER_AGENT");

 

I use the following to stop SE spiders from starting sessions.

 

function isSpider($userAgent) {
	if(stristr($userAgent, "Googlebot")		|| 	/* Google */
	   stristr($userAgent, "Slurp")			|| 	/* Inktomi/Y! */
	   stristr($userAgent, "MSNBOT")    	|| 	/* MSN */
	   stristr($userAgent, "teoma")    		|| 	/* Teoma */
	   stristr($userAgent, "ia_archiver")   || 	/* Alexa */
	   stristr($userAgent, "Scooter")    	|| 	/* Altavista */
	   stristr($userAgent, "Mercator")    	|| 	/* Altavista */
	   stristr($userAgent, "FAST")    		|| 	/* AllTheWeb */
	   stristr($userAgent, "MantraAgent")   || 	/* LookSmart */
	   stristr($userAgent, "Lycos")    		|| 	/* Lycos */
	   stristr($userAgent, "ZyBorg")) {    		/* WISEnut */

	   	return true;
	}
	return false;
}



if(isSpider(getenv("HTTP_USER_AGENT"))) {
  // redirect bots
}

 

The agent will still be in the server access logs however

I guess I was a little bit concerned b/c I have often read that crawlers, spiders, bots, etc... which ignore your rules....sorry should've been more specific.  Last year I was browsing a book about webbots and the mentioned that some sites employ advanced methods for detecting them.  I guess its silly though to worry about such a potentially small source of traffic! 

 

I like the isSpider function!

if you don't want anyone to read the logs you can block them in your apache config

 

if they are malicious web bots which look at the robots.txt file and then go to all of the listed places, you can make a trap by adding an entry which will lead them to a page where thier user agent is recorded and blocked from the site completely

 

Just a note it might be a good idea to add some of the popular user agents (FF, IE etc.) to an exclusion so that a curious visitor doesn't end up blocking all of the visitors using thier browser.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.