Jump to content

Prevent Bots/Spiders from getting directory listing?


doni49

Recommended Posts

I have an error handler (originally came from a book's associated sample code).  Whenever a PHP error occurs, I recieve a message telling me about it.

I've been recieving messages that make me believe a bot/spider is reading the directories and files under my web folder and trying to access the files.  One of the latest messages indicates that it was trying to access a file that I've been using for TESTING PURPOSES ONLY.  I haven't created EVEN ONE LINIK [b]ANYWHERE[/b].

This latest message had the following listed as it's user agent:
[quote]
[HTTP_USER_AGENT] => MJ12bot/v1.0.8 (http://majestic12.co.uk/bot.php?+)
[/quote]

I don't mind bots/spiders reading my files to index my site (because without them the search engines can't find me :)), but Is there any way to prevent them from getting a directory listing of my files and force them to ONLY access files via links?

TIA!


Robots/Bots/Spoders/Whatever are obliged to obay a robots.txt  file telling them what they can or cannot look at.

Create a file called robots.txt in the root directory of your site.  The file works like this:

To stop all robots seing anything:

User-agent: *
Disallow: /

To stop google seeing anything

User-agent: google
Disallow: /

To stop any robots/spiderbots seeing into particular foolders:

User-agent: *
Disallow: /cgi-bin/
Disallow: /testfile/


Simply paste these as you feal into robots.txt and they will look there first to see what they can or cant do.

Hope that helps
Since I want sites to be able to index my site, how can I get this to allow them to read the index file and any files that are linked from it and any files linked from them and so on?

I thought robots.txt only worked on bots that HONORED the file's instructions.  So the second question is how to force bots to honor this file's instructions?
Im not sure of the awnser to the first question, but i seem to remember google had a good article they pubblished on this.

As far as i am aware bots will always be designed to 'honer' this, but there may be one or two how dont.  However i would emagen that theses are only from small sites that in reality are not getting much traffic.  There is not a way to force them to honer it.  If you know or a specific bot you can block it intirly?  It might also be the case that using an .htaccess file for your testing stuff may stop them.  Would sertinatly stop any one being able to follow a link displaying the content.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.