Jump to content

Robots.txt order


themistral

Recommended Posts

Hi guys,

 

I am evaluating a site currently and the Robots.txt file has the following code:

User-agent: Googlebot
Allow: /

User-agent: Slurp
Crawl-delay: 120
Allow: /

User-agent: Msnbot
Allow: /

User-agent: ia_archiver
Allow: /

User-agent: *
Disallow: /

 

I use a tool to check various things and it flagged up that robots was disallowing bots.

 

I would like to check if the order makes any difference - does that final command override the bot-specific rules?

 

Thanks  :D

Link to comment
Share on other sites

Yes, it goes in order.  You have to first disallow everything and then allow the exceptions.

 

Also dunno if you know or not but the crawlers don't have to obey the rules of robots.txt.  Most major crawlers respect the rules but they don't *have* to and you shouldn't rely on robots.txt for other random bots/crawlers out there.  Hell I personally don't really rely on it even for the major crawlers.  Who knows when their policy might change.  Better to look at the headers server-side and do something there.  But even then that doesn't stop someone from faking headers.  Anyways...

Link to comment
Share on other sites

Cheers .josh

 

That's great to know as it explains a few things!

They implemented this as they were getting hammered by bots - it's an ecom site so I would guess a shopping bot was the problem.

 

Yep I know bad bots will ignore robots.txt, and some pages of the site are included in Google's index, so I would guess that at best, the robots.txt file is confusing bots at the moment.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.