Jump to content

Prevent Bots/Spiders from getting directory listing?


doni49

Recommended Posts

I have an error handler (originally came from a book's associated sample code).  Whenever a PHP error occurs, I recieve a message telling me about it.

I've been recieving messages that make me believe a bot/spider is reading the directories and files under my web folder and trying to access the files.  One of the latest messages indicates that it was trying to access a file that I've been using for TESTING PURPOSES ONLY.  I haven't created EVEN ONE LINIK [b]ANYWHERE[/b].

This latest message had the following listed as it's user agent:
[quote]
[HTTP_USER_AGENT] => MJ12bot/v1.0.8 (http://majestic12.co.uk/bot.php?+)
[/quote]

I don't mind bots/spiders reading my files to index my site (because without them the search engines can't find me :)), but Is there any way to prevent them from getting a directory listing of my files and force them to ONLY access files via links?

TIA!


Link to comment
Share on other sites

Robots/Bots/Spoders/Whatever are obliged to obay a robots.txt  file telling them what they can or cannot look at.

Create a file called robots.txt in the root directory of your site.  The file works like this:

To stop all robots seing anything:

User-agent: *
Disallow: /

To stop google seeing anything

User-agent: google
Disallow: /

To stop any robots/spiderbots seeing into particular foolders:

User-agent: *
Disallow: /cgi-bin/
Disallow: /testfile/


Simply paste these as you feal into robots.txt and they will look there first to see what they can or cant do.

Hope that helps
Link to comment
Share on other sites

Since I want sites to be able to index my site, how can I get this to allow them to read the index file and any files that are linked from it and any files linked from them and so on?

I thought robots.txt only worked on bots that HONORED the file's instructions.  So the second question is how to force bots to honor this file's instructions?
Link to comment
Share on other sites

Im not sure of the awnser to the first question, but i seem to remember google had a good article they pubblished on this.

As far as i am aware bots will always be designed to 'honer' this, but there may be one or two how dont.  However i would emagen that theses are only from small sites that in reality are not getting much traffic.  There is not a way to force them to honer it.  If you know or a specific bot you can block it intirly?  It might also be the case that using an .htaccess file for your testing stuff may stop them.  Would sertinatly stop any one being able to follow a link displaying the content.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.