Pro.Luv Posted February 9, 2009 Share Posted February 9, 2009 Hi, Im using this code to read a robot.txt file then add all the disallowed file and directories to an array $filename = "robots.txt"; $disallows = array(); $handle = fopen($filename, "rb"); $contents = fread($handle, filesize($filename)); preg_match_all("/Disallow:\s*(.+)/im", $contents, $foo); $disallow = $foo[1]; while(list($key, $value) = each($disallow)){ print $value. "<br />"; } fclose($handle); Now what i want to do is pass a link then check if the link or part of the link is in the array like: if(in_array("http://www.site.com/disallowedDir/", $disallow)){ //Skip link }else{ //do something cause link is not in array } I need to know how to do this thanks Link to comment https://forums.phpfreaks.com/topic/144520-robottxt/ Share on other sites More sharing options...
genericnumber1 Posted February 9, 2009 Share Posted February 9, 2009 I'm guessing your problem is that the sites are listing just their relative directories in their robots.txt and you want to compare it against the absolute website path? Perhaps something like this would suit your needs: <?php $url = 'http://www.site.com/disallowedDir/'; $path = parse_url($url, PHP_URL_PATH); if(in_array($path, $disallow)) { echo 'disallowed'; } Of course that does not match /disallowedDir (note: no trailing slash), but you can add that check in if it fits the robots.txt standard (I'm not sure what the standard is exactly with regard to trailing slashes). Link to comment https://forums.phpfreaks.com/topic/144520-robottxt/#findComment-758470 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.