etrader Posted July 25, 2011 Share Posted July 25, 2011 I am writing a simple counter, but it counts both visitors and bots. To distinguish between these visits, I use $_SERVER['HTTP_USER_AGENT']. By preg_match I can get which visit is by a bot. The common way in the internet is to define preg_match for each major bot (e.g. googlebot). As I explored, all bots have their official url in the uer_agent. Thus, preg_match by "http://" should consider all bots; as user_agent of a human visitor does not include a url. Right? What's your idea to do so? Link to comment https://forums.phpfreaks.com/topic/242736-differentiating-between-visitor-and-bot-in-user-agents/ Share on other sites More sharing options...
LeadingWebDev Posted July 25, 2011 Share Posted July 25, 2011 <?php echo $_SERVER['HTTP_USER_AGENT'] . "\n\n"; $browser = get_browser(null, true); print_r($browser); ?> should help you, but remember you should configure it on server and support him browser list. Reference: http://php.net/manual/en/function.get-browser.php Link to comment https://forums.phpfreaks.com/topic/242736-differentiating-between-visitor-and-bot-in-user-agents/#findComment-1246726 Share on other sites More sharing options...
etrader Posted July 25, 2011 Author Share Posted July 25, 2011 <?php echo $_SERVER['HTTP_USER_AGENT'] . "\n\n"; $browser = get_browser(null, true); print_r($browser); ?> should help you, but remember you should configure it on server and support him browser list. Reference: http://php.net/manual/en/function.get-browser.php Thanks LeadingWebDev, Yes this is an alternative solution, as get_browser will provide [crawler] to be true or false. But I have read some reviews on the internet as the [crawler] is not very accurate, probably because get_browser is not very popular. On the other hand, it gives lots of information which are useless to me. Then, I though using $_SERVER['HTTP_USER_AGENT'] and performing preg_match is a lighter process. What do you think? Link to comment https://forums.phpfreaks.com/topic/242736-differentiating-between-visitor-and-bot-in-user-agents/#findComment-1246727 Share on other sites More sharing options...
ohdang888 Posted July 25, 2011 Share Posted July 25, 2011 Well, really, you're limited to the information the client provides you. A.k.a - a bot could send a IE header, and you wouldn't know (using this method, at least). but anyways, a simple search reveals a quick solution: <?php if (preg_match('/slurp|inktomisearch|[Gg]rub|[bb]ot|archiver|[ss]qworm/', $_SERVER['HTTP_USER_AGENT'])) echo "Is not a human"; ?> Link to comment https://forums.phpfreaks.com/topic/242736-differentiating-between-visitor-and-bot-in-user-agents/#findComment-1246777 Share on other sites More sharing options...
LeadingWebDev Posted July 25, 2011 Share Posted July 25, 2011 no, Regular Expressions loads server hard. If u are building statistics that shouldn't catch crawlers (web search spiders) probably get_browser will help you, as every search bot have his own user agent. Link to comment https://forums.phpfreaks.com/topic/242736-differentiating-between-visitor-and-bot-in-user-agents/#findComment-1246780 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.