lemmin Posted March 11, 2008 Share Posted March 11, 2008 A php with a file_get_contents("http://www.mysite.com") would download the content of that page. If I wanted to stop these bots, what are some of the most secure techniques? I'm looking for something like: If (isbot) { header("Location: 404.php"); } //Normal page code here on. But what goes in the if statement? Thanks. Link to comment https://forums.phpfreaks.com/topic/95714-bot-crawlers/ Share on other sites More sharing options...
lemmin Posted March 17, 2008 Author Share Posted March 17, 2008 I'm going to bump this since it has like 10 views. Link to comment https://forums.phpfreaks.com/topic/95714-bot-crawlers/#findComment-494203 Share on other sites More sharing options...
TheUnknownCylon Posted March 17, 2008 Share Posted March 17, 2008 The official way is to create a bot.txt file and put in on your server. Otherwise you should create a database of known-bots and select them from there to exclude them from your server, or change the .htacces-file . Robots.txt on wikipedia: http://en.wikipedia.org/wiki/Robots.txt Link to comment https://forums.phpfreaks.com/topic/95714-bot-crawlers/#findComment-494237 Share on other sites More sharing options...
lemmin Posted March 17, 2008 Author Share Posted March 17, 2008 Cool thanks for that information; however, I'm not necessarily referring to those types of bots. If someone made a php script similar to the one I posted above, they would get through without being considered a bot, wouldn't they? The obvious way to prevent this is to check the referer header, but isn't there a way for the client to send a different header? I'm pretty sure they can't just use the header() function to send it across domains, but there has to be another way to do it, right? Is there a more secure way to catch this? Thanks again. Link to comment https://forums.phpfreaks.com/topic/95714-bot-crawlers/#findComment-494280 Share on other sites More sharing options...
thebadbad Posted March 17, 2008 Share Posted March 17, 2008 As far as I know, there is no way to prevent scripts from reading files like that. I tried setting a session, and then check if it's set, but the script was able to do that too, through file_get_contents(). Cookies however, would maybe distinguish scripts from actual users, but that requires every user to have cookies enabled in order to see your page. Link to comment https://forums.phpfreaks.com/topic/95714-bot-crawlers/#findComment-494340 Share on other sites More sharing options...
lemmin Posted March 17, 2008 Author Share Posted March 17, 2008 Cookies are a good idea, I will look into that, thanks. Link to comment https://forums.phpfreaks.com/topic/95714-bot-crawlers/#findComment-494360 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.