Jump to content

Robots.txt order


themistral

Recommended Posts

Hi guys,

 

I am evaluating a site currently and the Robots.txt file has the following code:

User-agent: Googlebot
Allow: /

User-agent: Slurp
Crawl-delay: 120
Allow: /

User-agent: Msnbot
Allow: /

User-agent: ia_archiver
Allow: /

User-agent: *
Disallow: /

 

I use a tool to check various things and it flagged up that robots was disallowing bots.

 

I would like to check if the order makes any difference - does that final command override the bot-specific rules?

 

Thanks  :D

Link to comment
https://forums.phpfreaks.com/topic/263016-robotstxt-order/
Share on other sites

Yes, it goes in order.  You have to first disallow everything and then allow the exceptions.

 

Also dunno if you know or not but the crawlers don't have to obey the rules of robots.txt.  Most major crawlers respect the rules but they don't *have* to and you shouldn't rely on robots.txt for other random bots/crawlers out there.  Hell I personally don't really rely on it even for the major crawlers.  Who knows when their policy might change.  Better to look at the headers server-side and do something there.  But even then that doesn't stop someone from faking headers.  Anyways...

Link to comment
https://forums.phpfreaks.com/topic/263016-robotstxt-order/#findComment-1348136
Share on other sites

Cheers .josh

 

That's great to know as it explains a few things!

They implemented this as they were getting hammered by bots - it's an ecom site so I would guess a shopping bot was the problem.

 

Yep I know bad bots will ignore robots.txt, and some pages of the site are included in Google's index, so I would guess that at best, the robots.txt file is confusing bots at the moment.

Link to comment
https://forums.phpfreaks.com/topic/263016-robotstxt-order/#findComment-1348143
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.