seito Posted March 17, 2012 Share Posted March 17, 2012 Hy! Any help on this topic would be much appreciated. Here is the deal. I need to redirect some countries to another URL. But I need to allow a free access for search engine's bots even if they come from those countries. Here is what I did until now: Used MaxMind PHP API and databases to establish redirect. And then tried to modify the code to redirect ONLY if user is from blocked country AND IS NOT allowed crawler... suprise suprise, it's not working as I imagined:)! Here is the code so far: #!/usr/bin/php -q <?php // This code demonstrates how to lookup the country by IP Address include("geoip.inc"); // Uncomment if querying against GeoIP/Lite City. // include("geoipcity.inc"); $gi = geoip_open("GeoIP.dat",GEOIP_STANDARD); $country = geoip_country_code_by_addr($gi, $_SERVER['REMOTE_ADDR']); geoip_close($gi); $my_countries = array('de', 'se', 'no', 'ee', 'it', 'lv', 'cz', 'dk', 'sk', 'at', 'ch', 'lu', 'nl', 'es', 'hu', 'sg', 'gr', 'ua', 'fi', 'ru', 'cn', 'hk', 'my', 'id', 'us', 'ar', 'mx', 'ec', 'cr', 'py', 'br'); $allowed_spiders = array('Googlebot', 'Yammybot', 'Openbot', 'Yahoo', 'MSNbot', 'Ask Jeeves', 'Teoma', 'Architext spider', 'FAST-WebCrawler', 'Slurp', 'Yahoo Slurp', 'ia_archiver', 'Scooter', 'crawler@fast', 'Crawler', 'InfoSeek sidewinder', 'Lycos_Spider_(T-Rex)'); $agent_name = $_SERVER['HTTP_USER_AGENT']; if (in_array(strtolower($country), $my_countries)) { foreach($allowed_spiders as $s){ if(!strpos($s,$agent_name)){ header('Location: www.REDIRECT URL.com'); } } exit; } ?> How I imagined this code should work is: First check from where is user coming. Then compare GeoIP shortcode for country to shortcodes for banned countries. AND then if traffic is indeed from banned country also check if user agent name has anything in it from an array of allowed spider names (Googlebot can be Googlebot, or Googlebot/2.1 for an example). And if it's not allowed spider then redirect user to another address. I tried many variations of this but nothing seems to work. Each time that I tried to fetch page as a googlebot in webmaster, I got 302 redirection. Any help or guidance here MUCH appreciated. Thank you PHP JEDI in advance !! Quote Link to comment https://forums.phpfreaks.com/topic/259117-geotargeting-and-allowing-spiderscrawlers/ Share on other sites More sharing options...
scootstah Posted March 17, 2012 Share Posted March 17, 2012 Try this if (in_array(strtolower($country), $my_countries)) { if (!in_array($agent_name, $allowed_spiders)) { header('Location: www.REDIRECT URL.com'); } exit; } Quote Link to comment https://forums.phpfreaks.com/topic/259117-geotargeting-and-allowing-spiderscrawlers/#findComment-1328372 Share on other sites More sharing options...
seito Posted March 17, 2012 Author Share Posted March 17, 2012 Thank you for fast replay. Sadly, it's not working... yet ! I actually tried this at first. But problem with !in_array is that it's searching for exact match from $allowed_spiders. And spiders can have different attachements to it's name. Like Googlebot for an instance, can be Googlebot/2.1. And in that case it's not recognized and gets redirected ... Any other idea? Quote Link to comment https://forums.phpfreaks.com/topic/259117-geotargeting-and-allowing-spiderscrawlers/#findComment-1328376 Share on other sites More sharing options...
scootstah Posted March 17, 2012 Share Posted March 17, 2012 The reason strpos doesn't work is because your spiders are too generic. You have "Googlebot" but if the bot is actually "Googlebot/2.1" it's not going to match. The other way around, though, would match. So you're either going to need to put all of the actual bot names in your array, or using something like similar_text or levenshtein Quote Link to comment https://forums.phpfreaks.com/topic/259117-geotargeting-and-allowing-spiderscrawlers/#findComment-1328378 Share on other sites More sharing options...
SaCH Posted March 17, 2012 Share Posted March 17, 2012 Try this code if (in_array(strtolower($country), $my_countries)) { foreach($allowed_spiders as $s){ if(!stristr($s,$agent_name)){ header('Location: www.REDIRECT URL.com'); } } exit; } Quote Link to comment https://forums.phpfreaks.com/topic/259117-geotargeting-and-allowing-spiderscrawlers/#findComment-1328419 Share on other sites More sharing options...
seito Posted March 17, 2012 Author Share Posted March 17, 2012 Try this code if (in_array(strtolower($country), $my_countries)) { foreach($allowed_spiders as $s){ if(!stristr($s,$agent_name)){ header('Location: www.REDIRECT URL.com'); } } exit; } No, sory! This still gives me 302 Moved Temporarily... This is now became challenge, huh ?! Any more ideas? BTW, can I ask you what is the purpose of <?php ob_start(); ?> after code you provided? @ scootstah: I think we are now on something. Where could I found such list of spiders user agents? I tried searching on Google but only found really old ones (2006) or DNS records. No up to date User agents for bots... Or if you please elaborate a bit more about similar_text() and levenshtein() functions... I checked topics on links you provided but I have troubles to modify examples to work in my situations. Quote Link to comment https://forums.phpfreaks.com/topic/259117-geotargeting-and-allowing-spiderscrawlers/#findComment-1328437 Share on other sites More sharing options...
SaCH Posted March 17, 2012 Share Posted March 17, 2012 BTW, can I ask you what is the purpose of <?php ob_start(); ?> after code you provided? Its was my forum signature Lolz!!! Quote Link to comment https://forums.phpfreaks.com/topic/259117-geotargeting-and-allowing-spiderscrawlers/#findComment-1328444 Share on other sites More sharing options...
SaCH Posted March 17, 2012 Share Posted March 17, 2012 Try my new version of code if (in_array(strtolower($country), $my_countries)) { foreach($allowed_spiders as $s){ list($val1,$val2) = explode(";",$agent_name); list($check) = explode("/",$val2); if(!stristr($s,$check)){ header('Location: www.REDIRECT URL.com'); } } exit; } [code] Quote Link to comment https://forums.phpfreaks.com/topic/259117-geotargeting-and-allowing-spiderscrawlers/#findComment-1328445 Share on other sites More sharing options...
SaCH Posted March 17, 2012 Share Posted March 17, 2012 @ seito My new version of code worked ? or still error ? Quote Link to comment https://forums.phpfreaks.com/topic/259117-geotargeting-and-allowing-spiderscrawlers/#findComment-1328458 Share on other sites More sharing options...
seito Posted March 17, 2012 Author Share Posted March 17, 2012 No, sadly it didn't. I still get 302 redirect error when fetching as googlebot in webmaster tools. To update situation on my own changes: - I have started to use CloudFlare service. I think this is no importance for our code and that $agent_name = $_SERVER['HTTP_USER_AGENT']; should still work normally. - CloudFlare offers it's own Geotargeting solution. Which is in importance for us since users are redirected through their proxy and so do not necessary ''appear'' from their country. I modified: #!/usr/bin/php -q <?php // This code demonstrates how to lookup the country by IP Address include("geoip.inc"); // Uncomment if querying against GeoIP/Lite City. // include("geoipcity.inc"); $gi = geoip_open("GeoIP.dat",GEOIP_STANDARD); $country = geoip_country_code_by_addr($gi, $_SERVER['REMOTE_ADDR']); geoip_close($gi); In this: #!/usr/bin/php -q <?php $country = $_SERVER["HTTP_CF_IPCOUNTRY"]; Which should return me users REAL XY country code... I think I did it right since now blocking from countries seems to be stable. As far as I managed to test it through free proxy servers. Will need to motivate my WWW friends for some tests also ... any volunteers ? From now on blocked countries, allowed bots and user agent are the same: $my_countries = array('de', 'se', 'no', 'ee', 'it', 'lv', 'cz', 'dk', 'sk', 'at', 'ch', 'lu', 'nl', 'es', 'hu', 'sg', 'gr', 'ua', 'fi', 'ru', 'cn', 'hk', 'my', 'id', 'us', 'ar', 'mx', 'ec', 'cr', 'py', 'br'); $allowed_spiders = array('Googlebot', 'Yammybot', 'Openbot', 'Yahoo', 'MSNbot', 'Ask Jeeves', 'Teoma', 'Architext spider', 'FAST-WebCrawler', 'Slurp', 'Yahoo Slurp', 'ia_archiver', 'Scooter', 'crawler@fast', 'Crawler', 'InfoSeek sidewinder', 'Lycos_Spider_(T-Rex)'); $agent_name = $_SERVER['HTTP_USER_AGENT']; To put it together, this is what I have at the moment (including your solution): #!/usr/bin/php -q <?php $country = $_SERVER["HTTP_CF_IPCOUNTRY"]; $my_countries = array('de', 'se', 'no', 'ee', 'it', 'lv', 'cz', 'dk', 'sk', 'at', 'ch', 'lu', 'nl', 'es', 'hu', 'sg', 'gr', 'ua', 'fi', 'ru', 'cn', 'hk', 'my', 'id', 'us', 'ar', 'mx', 'ec', 'cr', 'py', 'br'); $allowed_spiders = array('Googlebot', 'Yammybot', 'Openbot', 'Yahoo', 'MSNbot', 'Ask Jeeves', 'Teoma', 'Architext spider', 'FAST-WebCrawler', 'Slurp', 'Yahoo Slurp', 'ia_archiver', 'Scooter', 'crawler@fast', 'Crawler', 'InfoSeek sidewinder', 'Lycos_Spider_(T-Rex)'); $agent_name = $_SERVER['HTTP_USER_AGENT']; if (in_array(strtolower($country), $my_countries)) { foreach($allowed_spiders as $s){ list($val1,$val2) = explode(";",$agent_name); list($check) = explode("/",$val2); if(!stristr($s,$check)){ header('Location: www.REDIRECT URL.com'); } } exit; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/259117-geotargeting-and-allowing-spiderscrawlers/#findComment-1328465 Share on other sites More sharing options...
seito Posted March 19, 2012 Author Share Posted March 19, 2012 Hy. If anybody will need something similar in future, this is what in the end worked for me: <?php $country = $_SERVER["HTTP_CF_IPCOUNTRY"]; $my_countries = array('de', 'se', 'no', 'ee', 'it', 'lv', 'cz', 'dk', 'sk', 'at', 'ch', 'lu', 'nl', 'es', 'hu', 'sg', 'gr', 'ua', 'fi', 'ru', 'cn', 'hk', 'my', 'id', 'ar', 'mx', 'ec', 'cr', 'py', 'br', 'us'); if (in_array(strtolower($country), $my_countries)) { function getnotCrawler($userAgent) { $crawlers = 'firefox|Google|msnbot|Rambler|Yahoo|AbachoBOT|accoona|' . 'AcioRobot|ASPSeek|CocoCrawler|Dumbot|FAST-WebCrawler|' . 'GeonaBot|Gigabot|Lycos|MSRBOT|Scooter|AltaVista|IDBot|eStyle|Scrubby'; $notCrawler = (preg_match("/$crawlers/i", $userAgent) == 0); return $notCrawler; } $notCrawler = getnotCrawler($_SERVER['HTTP_USER_AGENT']); if ($notCrawler) { header('Location: www.REDIRECTION URL.com'); exit; } else { // "not crawler!"; } } ?> $_SERVER["HTTP_CF_IPCOUNTRY"]; is specific for CloudeFlare that I'm using. If you are not, adapt this part to your needs to get XY name of visitor's country. Quote Link to comment https://forums.phpfreaks.com/topic/259117-geotargeting-and-allowing-spiderscrawlers/#findComment-1328994 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.