Jump to content

Bot crawlers


lemmin

Recommended Posts

A php with a file_get_contents("http://www.mysite.com") would download the content of that page. If I wanted to stop these bots, what are some of the most secure techniques? I'm looking for something like:

If (isbot)
{
   header("Location: 404.php");
}
//Normal page code here on.

But what goes in the if statement?

 

Thanks.

Link to comment
https://forums.phpfreaks.com/topic/95714-bot-crawlers/
Share on other sites

The official way is to create a bot.txt file and put in on your server.

 

Otherwise you should create a database of known-bots and select them from there to exclude them from your server, or change the .htacces-file .

 

Robots.txt on wikipedia: http://en.wikipedia.org/wiki/Robots.txt

 

Link to comment
https://forums.phpfreaks.com/topic/95714-bot-crawlers/#findComment-494237
Share on other sites

Cool thanks for that information; however, I'm not necessarily referring to those types of bots. If someone made a php script similar to the one I posted above, they would get through without being considered a bot, wouldn't they? The obvious way to prevent this is to check the referer header, but isn't there a way for the client to send a different header? I'm pretty sure they can't just use the header() function to send it across domains, but there has to be another way to do it, right? Is there a more secure way to catch this?

 

Thanks again.

Link to comment
https://forums.phpfreaks.com/topic/95714-bot-crawlers/#findComment-494280
Share on other sites

As far as I know, there is no way to prevent scripts from reading files like that. I tried setting a session, and then check if it's set, but the script was able to do that too, through file_get_contents(). Cookies however, would maybe distinguish scripts from actual users, but that requires every user to have cookies enabled in order to see your page.

Link to comment
https://forums.phpfreaks.com/topic/95714-bot-crawlers/#findComment-494340
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.