Jump to content

Any way to identify bots?


RyanSF07

Recommended Posts

Hello,

 

I'm using this plain and simple script below to count page views.  Is there a way to identify a spider bot with php and then do something like..  if this visitor is a bot, don't count it.  If not, count it?

 

 

Here's my simple script:

 

$querySelect = mysql_query("SELECT * FROM  video WHERE video.id = '$_GET[id]'");
$rowcount = mysql_fetch_assoc($querySelect);
$count = $rowcount['counter'];

if (empty($count)) {
$counter = 1;
$insert = mysql_query("INSERT INTO video (counter) VALUES ($counter) WHERE video.id = '$_GET[id]'");
}

$add = $count+1;
$insertNew = mysql_query("UPDATE video SET counter='$add' WHERE video.id = '$_GET[id]'");

 

thanks,

Ryan

Link to comment
https://forums.phpfreaks.com/topic/223908-any-way-to-identify-bots/
Share on other sites

a good bot is hard to spot. bad bots aren't as hard to spot.

 

as far as I know, the most sure-fire way to detect a bot is to require that the visitor execute Javascript code to count as a hit. but with php you can check for things like user agent, reverse dns to try to acquire the remote domain name, other header information that might be available, etc.

As an aside, there's no reason to run 3 database queries in that script. In phpMyAdmin, alter the table so any future entries get 0 as a default value in the `counter` field then set all of the empty `counter` fields to 0 (only needs to be done once).

UPDATE `video` SET `counter` = 0 WHERE `counter` = '' OR `counter` IS NULL

 

Then change the script above so that when a valid visitor is detected it executes this query (after validating/sanitizing $_GET['id'] of course):

"UPDATE `video` SET `counter` = (`counter` + 1) WHERE `id` = {$_GET['id']}"

Thank you, Picachu. That worked perfectly.

 

Here's what I have now (below).  My question now is do I have to run this array -- or -- is their some identifying "tag" that all bots have that flags them as a bot? 

 

That way I could just check for that tag, and if it's present -- not count the page view.  Please let me know if you have any ideas.

 

Thank you again for your help. 

Ryan

$botarray = array(   
                "Teoma",                   
                "alexa",
                "froogle",
                "inktomi",
                "looksmart",
                "URL_Spider_SQL",
                "Firefly",
                "NationalDirectory",
                "Ask Jeeves",
                "TECNOSEEK",
                "InfoSeek",
                "WebFindBot",
                "girafabot",
                "crawler",
                "Googlebot",
                "Scooter",
                "Slurp",
                "appie",
                "FAST",
                "WebBug",
                "Spade",
                "ZyBorg");


    foreach($botarray as $botname) {

      if(ereg($botname, $HTTP_USER_AGENT)) {
      
      
              
$recep = "[email protected]";
		$subject = "... bot";
		$text = "$botname";
		$headers = "X-Mailer: PHP\n";
		mail("$recep","$subject","$text","$headers");


}
else
    {
    
    $a = TRUE;
     
    }
     }
   
if ($a) {
mysql_query("UPDATE `video` SET `counter` = (`counter` + 1) WHERE `id` = {$_GET['id']}");

};


 

 

Nice bots will read and follow your robots.txt file, and they will give you plenty of information in headers to recognize them, including user agent.

 

Bad bots (yandex.ru for one) and data scraper bots will ignore robots.txt and they will give no indication that they are bots except that they (probably) will not execute javascript. They will usually include a regular web browser user agent.

there is nothing that all bots have in common. the lowest common denominator is their usual inability to parse and execute javascript. but even smarter bots can parse and execute some javascript. for what it's worth: i write bot code and write code to (attempt to) detect bots.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.