Jump to content

Robot.txt


Pro.Luv

Recommended Posts

Hi,

 

Im using this code to read a robot.txt file then add all the disallowed file and directories to an array

 

$filename = "robots.txt";
$disallows = array();

$handle = fopen($filename, "rb");
$contents = fread($handle, filesize($filename));

preg_match_all("/Disallow:\s*(.+)/im", $contents, $foo);
$disallow = $foo[1];  

while(list($key, $value) = each($disallow)){
print $value. "<br />";

}

fclose($handle);

 

 

Now what i want to do is pass a link then check if the link or part of the link is in the array like:

 

if(in_array("http://www.site.com/disallowedDir/", $disallow)){

 

//Skip link

 

}else{

 

//do something cause link is not in array

 

}

 

I need to know how to do this thanks

 

 

 

 

 

Link to comment
https://forums.phpfreaks.com/topic/144520-robottxt/
Share on other sites

I'm guessing your problem is that the sites are listing just their relative directories in their robots.txt and you want to compare it against the absolute website path?

 

Perhaps something like this would suit your needs:

<?php
$url = 'http://www.site.com/disallowedDir/';
$path = parse_url($url, PHP_URL_PATH);

if(in_array($path, $disallow)) {
    echo 'disallowed';
}

 

Of course that does not match /disallowedDir (note: no trailing slash), but you can add that check in if it fits the robots.txt standard (I'm not sure what the standard is exactly with regard to trailing slashes).

Link to comment
https://forums.phpfreaks.com/topic/144520-robottxt/#findComment-758470
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.