Jump to content

Recommended Posts

I have a robots.txt on the site root directory with the content text of:

User-agent: *
Disallow: /

By reading some stuffs they say that no crawler will engage the file content of my site like google and it will be safe for the bad crawlers not see the directory of your site and of course it is good for the security. But when I search the site on google it happen that

A description for this result is not available because of this site's robots.txt – learn more.

So my question now is how can a search engine be known the description of the site if your not allowing they're crawler not to engage the protected directory where all your file is ? 

 

 

 

While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web.

Basically if google finds links to your site, it will still index the URLs, just not any of the pages contents. So based on the URL or text used within the pages that link to you, google still matched your URL and included it in the result, but because of your robots.txt google doesn't have any of the page's content in order to provide a good description of the site.

 

According to google, to completely prevent the site from appearing in search results at all you need to use the noindex meta or http header:

To entirely prevent a page's contents from being listed in the Google web index even if other sites link to it, use a noindex meta tag or x-robots-tag. As long as Googlebot fetches the page, it will see the noindex meta tag and prevent that page from showing up in the web index. The x-robots-tag HTTP header is particularly useful if you wish to limit indexing of non-HTML files like graphics or other kinds of documents.

Note that the above is specific to google. Other search engines may handle things differently. Is there any particular reason why you wish to block your entire site from being crawled by search engines?

Thank you for your explanation, now its more clear to me.

 

 

 

Is there any particular reason why you wish to block your entire site from being crawled by search engines?

 

No, not really to block by search engines crawlers but to block some bad crawlers. I forgot the link but I read that robots can setup session also to your website. So right now I'm still confuse if I just specify robots who can access some directory of the site.  

 

What I'm developing is a payment system website. So I'm kinda nervous to commit something I really don't understand.

 

Thanks again for your explanation.

No, not really to block by search engines crawlers but to block some bad crawlers.

A bad crawler is going to flat out ignore your robots.txt file and crawl your site anyway. The only thing a robots.txt file is good for is to indicate to a good crawler which paths you would prefer it not crawl.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.