DEVILofDARKNESS Posted September 10, 2009 Share Posted September 10, 2009 I actually don't know where to put this, I have the following robots.txt file: And I want to disallow all except those with allow: This works fine, but now I want to also allow the homepage '/' but if I do Allow / googlebot thinks everything is allowed how to solve this (just index.php doesn't work) User-agent: * Disallow: / Allow: /index.php Allow: /indexx.php Allow: /about.php Allow: /tuto.php Allow: /navigation.php Allow: /footer.php Quote Link to comment Share on other sites More sharing options...
sawade Posted September 10, 2009 Share Posted September 10, 2009 So are you trying to allow index.php or not? I am comfused. You have disallowed everything in the root, then are listing specific files to allow. Have you included robots into your meta tags? Quote Link to comment Share on other sites More sharing options...
DEVILofDARKNESS Posted September 10, 2009 Author Share Posted September 10, 2009 Hm? No it's just a file on my server, I will try to explain it: I have the site http://www.ninv.be I want to allow that(just the sitename) but disallow all the rest I dunno how to allow the sitename itself... And sorry for the code-tags I thought it wasn't nec. in this case Quote Link to comment Share on other sites More sharing options...
sawade Posted September 10, 2009 Share Posted September 10, 2009 Okay. First you do understand that robots.txt is a way to get rid of "good robots" from going through your files and finding things sensitive. It will not dissuade "bad robots". All this will do is tell the bot where it can and can not crawl. It is usually better to not allow crawling anywhere. User-agent: * Disallow: / This is a good thing. Now if you are worried about indexing issues... use the meta tags. Example:: For the index page <meta name="ROBOTS" content="INDEX, NOFOLLOW" /> For the rest of the pages/as you see fit: <meta name="ROBOTS" content="NOINDEX, NOFOLLOW" /> These obviously go in the <head> tags of your web page. Does this help? Quote Link to comment Share on other sites More sharing options...
DEVILofDARKNESS Posted September 10, 2009 Author Share Posted September 10, 2009 Uhu, so if you disallow all the robots, google will still be able to update your site information? Quote Link to comment Share on other sites More sharing options...
sawade Posted September 10, 2009 Share Posted September 10, 2009 To allow Google: User-agent: Google Disallow: Quote Link to comment Share on other sites More sharing options...
sawade Posted September 10, 2009 Share Posted September 10, 2009 Here is a good example. User-agent: Googlebot Disallow: / User-agent: googlebot-image Disallow: / User-agent: googlebot-mobile Disallow: / User-agent: MSNBot Disallow: / User-agent: Slurp Disallow: / User-agent: Teoma Disallow: / User-agent: twiceler Disallow: / User-agent: Gigabot Disallow: / User-agent: Scrubby Disallow: / User-agent: Robozilla Disallow: / User-agent: Nutch Disallow: / User-agent: ia_archiver Disallow: / User-agent: baiduspider Disallow: / User-agent: naverbot Disallow: / User-agent: yeti Disallow: / User-agent: yahoo-mmcrawler Disallow: / User-agent: psbot Disallow: / User-agent: asterias Disallow: / User-agent: yahoo-blogs/v3.9 Disallow: / User-agent: * Disallow: / Disallow: /cgi-bin/ Using the Disallow: / will keep google and like search engines from the root, this is true. So do use with caution. If you are looking to limit down where they look, it may be easier to list the folders and pages you want to disallow instead of allowing, example... User-agent: * Disallow: /folder Allow: /folder/page.html More info can be found here... http://www.robotstxt.org/ Quote Link to comment Share on other sites More sharing options...
sawade Posted September 10, 2009 Share Posted September 10, 2009 Uhu, so if you disallow all the robots, google will still be able to update your site information? Yes. I always use User-agent: * and Disallow: / and I have no problems with being found in search engines. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.