etrader Posted July 25, 2011 Share Posted July 25, 2011 I am writing a simple counter, but it counts both visitors and bots. To distinguish between these visits, I use $_SERVER['HTTP_USER_AGENT']. By preg_match I can get which visit is by a bot. The common way in the internet is to define preg_match for each major bot (e.g. googlebot). As I explored, all bots have their official url in the uer_agent. Thus, preg_match by "http://" should consider all bots; as user_agent of a human visitor does not include a url. Right? What's your idea to do so? Quote Link to comment https://forums.phpfreaks.com/topic/242736-differentiating-between-visitor-and-bot-in-user-agents/ Share on other sites More sharing options...
LeadingWebDev Posted July 25, 2011 Share Posted July 25, 2011 <?php echo $_SERVER['HTTP_USER_AGENT'] . "\n\n"; $browser = get_browser(null, true); print_r($browser); ?> should help you, but remember you should configure it on server and support him browser list. Reference: http://php.net/manual/en/function.get-browser.php Quote Link to comment https://forums.phpfreaks.com/topic/242736-differentiating-between-visitor-and-bot-in-user-agents/#findComment-1246726 Share on other sites More sharing options...
etrader Posted July 25, 2011 Author Share Posted July 25, 2011 <?php echo $_SERVER['HTTP_USER_AGENT'] . "\n\n"; $browser = get_browser(null, true); print_r($browser); ?> should help you, but remember you should configure it on server and support him browser list. Reference: http://php.net/manual/en/function.get-browser.php Thanks LeadingWebDev, Yes this is an alternative solution, as get_browser will provide [crawler] to be true or false. But I have read some reviews on the internet as the [crawler] is not very accurate, probably because get_browser is not very popular. On the other hand, it gives lots of information which are useless to me. Then, I though using $_SERVER['HTTP_USER_AGENT'] and performing preg_match is a lighter process. What do you think? Quote Link to comment https://forums.phpfreaks.com/topic/242736-differentiating-between-visitor-and-bot-in-user-agents/#findComment-1246727 Share on other sites More sharing options...
ohdang888 Posted July 25, 2011 Share Posted July 25, 2011 Well, really, you're limited to the information the client provides you. A.k.a - a bot could send a IE header, and you wouldn't know (using this method, at least). but anyways, a simple search reveals a quick solution: <?php if (preg_match('/slurp|inktomisearch|[Gg]rub|[bb]ot|archiver|[ss]qworm/', $_SERVER['HTTP_USER_AGENT'])) echo "Is not a human"; ?> Quote Link to comment https://forums.phpfreaks.com/topic/242736-differentiating-between-visitor-and-bot-in-user-agents/#findComment-1246777 Share on other sites More sharing options...
LeadingWebDev Posted July 25, 2011 Share Posted July 25, 2011 no, Regular Expressions loads server hard. If u are building statistics that shouldn't catch crawlers (web search spiders) probably get_browser will help you, as every search bot have his own user agent. Quote Link to comment https://forums.phpfreaks.com/topic/242736-differentiating-between-visitor-and-bot-in-user-agents/#findComment-1246780 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.