doni49 Posted December 9, 2006 Share Posted December 9, 2006 I have an error handler (originally came from a book's associated sample code). Whenever a PHP error occurs, I recieve a message telling me about it.I've been recieving messages that make me believe a bot/spider is reading the directories and files under my web folder and trying to access the files. One of the latest messages indicates that it was trying to access a file that I've been using for TESTING PURPOSES ONLY. I haven't created EVEN ONE LINIK [b]ANYWHERE[/b].This latest message had the following listed as it's user agent:[quote][HTTP_USER_AGENT] => MJ12bot/v1.0.8 (http://majestic12.co.uk/bot.php?+)[/quote]I don't mind bots/spiders reading my files to index my site (because without them the search engines can't find me :)), but Is there any way to prevent them from getting a directory listing of my files and force them to ONLY access files via links?TIA! Quote Link to comment Share on other sites More sharing options...
the_oliver Posted December 10, 2006 Share Posted December 10, 2006 Robots/Bots/Spoders/Whatever are obliged to obay a robots.txt file telling them what they can or cannot look at.Create a file called robots.txt in the root directory of your site. The file works like this:To stop all robots seing anything:User-agent: *Disallow: /To stop google seeing anythingUser-agent: googleDisallow: /To stop any robots/spiderbots seeing into particular foolders:User-agent: *Disallow: /cgi-bin/Disallow: /testfile/Simply paste these as you feal into robots.txt and they will look there first to see what they can or cant do.Hope that helps Quote Link to comment Share on other sites More sharing options...
doni49 Posted December 10, 2006 Author Share Posted December 10, 2006 Since I want sites to be able to index my site, how can I get this to allow them to read the index file and any files that are linked from it and any files linked from them and so on?I thought robots.txt only worked on bots that HONORED the file's instructions. So the second question is how to force bots to honor this file's instructions? Quote Link to comment Share on other sites More sharing options...
the_oliver Posted December 10, 2006 Share Posted December 10, 2006 Im not sure of the awnser to the first question, but i seem to remember google had a good article they pubblished on this.As far as i am aware bots will always be designed to 'honer' this, but there may be one or two how dont. However i would emagen that theses are only from small sites that in reality are not getting much traffic. There is not a way to force them to honer it. If you know or a specific bot you can block it intirly? It might also be the case that using an .htaccess file for your testing stuff may stop them. Would sertinatly stop any one being able to follow a link displaying the content. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.