themistral Posted May 23, 2012 Share Posted May 23, 2012 Hi guys, I am evaluating a site currently and the Robots.txt file has the following code: User-agent: Googlebot Allow: / User-agent: Slurp Crawl-delay: 120 Allow: / User-agent: Msnbot Allow: / User-agent: ia_archiver Allow: / User-agent: * Disallow: / I use a tool to check various things and it flagged up that robots was disallowing bots. I would like to check if the order makes any difference - does that final command override the bot-specific rules? Thanks Quote Link to comment https://forums.phpfreaks.com/topic/263016-robotstxt-order/ Share on other sites More sharing options...
.josh Posted May 23, 2012 Share Posted May 23, 2012 Yes, it goes in order. You have to first disallow everything and then allow the exceptions. Also dunno if you know or not but the crawlers don't have to obey the rules of robots.txt. Most major crawlers respect the rules but they don't *have* to and you shouldn't rely on robots.txt for other random bots/crawlers out there. Hell I personally don't really rely on it even for the major crawlers. Who knows when their policy might change. Better to look at the headers server-side and do something there. But even then that doesn't stop someone from faking headers. Anyways... Quote Link to comment https://forums.phpfreaks.com/topic/263016-robotstxt-order/#findComment-1348136 Share on other sites More sharing options...
themistral Posted May 23, 2012 Author Share Posted May 23, 2012 Cheers .josh That's great to know as it explains a few things! They implemented this as they were getting hammered by bots - it's an ecom site so I would guess a shopping bot was the problem. Yep I know bad bots will ignore robots.txt, and some pages of the site are included in Google's index, so I would guess that at best, the robots.txt file is confusing bots at the moment. Quote Link to comment https://forums.phpfreaks.com/topic/263016-robotstxt-order/#findComment-1348143 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.