Jump to content

[SOLVED] Robots.txt


DEVILofDARKNESS

Recommended Posts

I actually don't know where to put this,

 

I have the following robots.txt file:

And I want to disallow all except those with allow:

This works fine, but now I want to also allow the homepage '/'

but if I do Allow / googlebot thinks everything is allowed how to solve this

(just index.php doesn't work)

 

User-agent: *

Disallow: /

Allow: /index.php

Allow: /indexx.php

Allow: /about.php

Allow: /tuto.php

Allow: /navigation.php

Allow: /footer.php

Link to comment
Share on other sites

Okay.  First you do understand that robots.txt is a way to get rid of "good robots" from going through your files and finding things sensitive.  It will not dissuade "bad robots".  All this will do is tell the bot where it can and can not crawl.  It is usually better to not allow crawling anywhere.

 

User-agent: *
Disallow: /

This is a good thing.

 

Now if you are worried about indexing issues... use the meta tags.

 

Example::

 

For the index page

<meta name="ROBOTS" content="INDEX, NOFOLLOW" />

 

For the rest of the pages/as you see fit: 

<meta name="ROBOTS" content="NOINDEX, NOFOLLOW" />

 

These obviously go in the <head> tags of your web page.

 

Does this help?

Link to comment
Share on other sites

Here is a good example.

 

User-agent: Googlebot
Disallow: /
User-agent: googlebot-image
Disallow: /
User-agent: googlebot-mobile
Disallow: /
User-agent: MSNBot
Disallow: /
User-agent: Slurp
Disallow: /
User-agent: Teoma
Disallow: /
User-agent: twiceler
Disallow: /
User-agent: Gigabot
Disallow: /
User-agent: Scrubby
Disallow: /
User-agent: Robozilla
Disallow: /
User-agent: Nutch
Disallow: /
User-agent: ia_archiver
Disallow: /
User-agent: baiduspider
Disallow: /
User-agent: naverbot
Disallow: /
User-agent: yeti
Disallow: /
User-agent: yahoo-mmcrawler
Disallow: /
User-agent: psbot
Disallow: /
User-agent: asterias
Disallow: /
User-agent: yahoo-blogs/v3.9
Disallow: /
User-agent: *
Disallow: /
Disallow: /cgi-bin/

 

Using the Disallow: / will keep google and like search engines from the root, this is true.  So do use with caution.  If you are looking to limit down where they look, it may be easier to list the folders and pages you want to disallow instead of allowing, example...

 

User-agent: *
Disallow: /folder
Allow: /folder/page.html

 

More info can be found here... http://www.robotstxt.org/

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.