Search the Community
Showing results for tags 'robots'.
-
I have knocked up this liitle class for retrieving a list of known robot user agents from the really rather helpfull people over at robotstxt.org. It pulls info from their site and builds an array that can be used to compare against the $_SERVER['HTTP_USER_AGENT'] varable. It has an exlusion array that can be altered to suit your personal prefferences and can be echoed directly to produce a valid JSON string that can be passed as is to a JQuery/Javascript using AJAX or anything of that like. I am putting no restrictions on this, but the people over at the robotstxt.org do request that you give them a mention for accessing their data, so I leave that up to anyone who may want to use it. Anyway, I found the need to be able to ensure bots didn't get free reign of the site I was making and thought that some other people out there may have a use for this. Here it is, enjoy (maybe) - anyway let me know what you guys think of it. (p.s. - I'm new to the whole DocBlock thing... ) <?php /** * Generates a list of robot useragent deffinitions for use with * $_SERVER['HTTP_USER_AGENT'] to identify robots * * This links into the robotstext.org site to access thier current * robot list. It then produces an arrau of these useragents that * can be used to check if a visitor is a robot or not. * Call: $yourVar = new getRobots(); * $robotArray = $yourVar->robots; * * JSON output (if you want to pass to javascript): echo $yourVar; * * * @param string $url Link to robotstxt.org server * @param array $robots the array list of useragents * @return __toString Returns JSON string of Object{"robots":array[{"numericalKey":"useragentText"}] */ class getRobots{ public $url; public $robots=array(); public function __construct() $url = "http://www.robotstxt.org/db/all.txt"{ $fullList = file($url); $exlusions = array //add lines here to include exclusions for any other agents in the list ( "", "no", "Due to a deficiency in Java it's not currently possible to set the User-Agent.", "???", "no", "yes" ); echo "<pre>"; foreach ($fullList as $line=>$content){ $delimit = ":"; $split = explode($delimit, $content); if(trim($split['0']) == "robot-useragent"){ $conCount = count($split); $agent = ""; for($i=0;$i<$conCount;$i++){ if($i != 0){ $conPart = $i; $agent .= " {$split[$conPart]} "; } } array_push( $this->robots, trim($agent)); } } foreach($this->robots as $key=>$agent){ if(in_array($agent, $exlusions)){ unset($this->robots[$key]); } } } public function __toString(){ $json = "{\"robots\":[".json_encode($this->robots)."]}"; return $json; } } ?>
- 14 replies