Jump to content

exclude spiders and bots from logs


rubing

Recommended Posts

No, you can get the agent - but be warned it can easily be spoofed!

getenv("HTTP_USER_AGENT");

 

I use the following to stop SE spiders from starting sessions.

 

function isSpider($userAgent) {
	if(stristr($userAgent, "Googlebot")		|| 	/* Google */
	   stristr($userAgent, "Slurp")			|| 	/* Inktomi/Y! */
	   stristr($userAgent, "MSNBOT")    	|| 	/* MSN */
	   stristr($userAgent, "teoma")    		|| 	/* Teoma */
	   stristr($userAgent, "ia_archiver")   || 	/* Alexa */
	   stristr($userAgent, "Scooter")    	|| 	/* Altavista */
	   stristr($userAgent, "Mercator")    	|| 	/* Altavista */
	   stristr($userAgent, "FAST")    		|| 	/* AllTheWeb */
	   stristr($userAgent, "MantraAgent")   || 	/* LookSmart */
	   stristr($userAgent, "Lycos")    		|| 	/* Lycos */
	   stristr($userAgent, "ZyBorg")) {    		/* WISEnut */

	   	return true;
	}
	return false;
}



if(isSpider(getenv("HTTP_USER_AGENT"))) {
  // redirect bots
}

 

The agent will still be in the server access logs however

Link to comment
Share on other sites

I guess I was a little bit concerned b/c I have often read that crawlers, spiders, bots, etc... which ignore your rules....sorry should've been more specific.  Last year I was browsing a book about webbots and the mentioned that some sites employ advanced methods for detecting them.  I guess its silly though to worry about such a potentially small source of traffic! 

 

I like the isSpider function!

Link to comment
Share on other sites

if you don't want anyone to read the logs you can block them in your apache config

 

if they are malicious web bots which look at the robots.txt file and then go to all of the listed places, you can make a trap by adding an entry which will lead them to a page where thier user agent is recorded and blocked from the site completely

 

Just a note it might be a good idea to add some of the popular user agents (FF, IE etc.) to an exclusion so that a curious visitor doesn't end up blocking all of the visitors using thier browser.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.