mysterbx Posted April 8, 2009 Share Posted April 8, 2009 Hy, Is there a way to somehow block any (exept my own) site or program from crawling my site? Something with htaccess or php, i dont know, protect my data somehow, from curl/get_file_contents/include and so one... Quote Link to comment Share on other sites More sharing options...
schilly Posted April 8, 2009 Share Posted April 8, 2009 you can use a robots.txt to stop bots from indexing your site. i'm not sure on the specifics. just google it. Quote Link to comment Share on other sites More sharing options...
mysterbx Posted April 8, 2009 Author Share Posted April 8, 2009 "So don't try to use /robots.txt to hide information. ", said robots.txt but thats exactly what i need... Quote Link to comment Share on other sites More sharing options...
schilly Posted April 8, 2009 Share Posted April 8, 2009 create a script to check the user agent then decide whether to load or not load content? Quote Link to comment Share on other sites More sharing options...
redarrow Posted April 8, 2009 Share Posted April 8, 2009 <html> <head> <meta name="robots" content="noindex" /> <title>Don't index this page</title> </head> Quote Link to comment Share on other sites More sharing options...
mysterbx Posted April 9, 2009 Author Share Posted April 9, 2009 create a script to check the user agent then decide whether to load or not load content? you can change your user agent or set a fake one with curl... Quote Link to comment Share on other sites More sharing options...
PFMaBiSmAd Posted April 9, 2009 Share Posted April 9, 2009 What exact problem are you having? The whole point of the Internet is to publish content. For your content to be visible on your pages, it must be accessible to the browser to request to render on the page. Even if you use a member/login system, someone could create an account and then use CURL to log in and access the content. Quote Link to comment Share on other sites More sharing options...
mysterbx Posted April 9, 2009 Author Share Posted April 9, 2009 there is no problemo, i just need protection i just need to block anything that is not human, i tried to access www.katz.cd with curl, tried all kinds of methods/options, just couldnt get any reply from it, thats what i need... Its something with htaccess i gues... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.