Jump to content

Automated link checker


merylvingien

Recommended Posts

I have a link checking system which simply checks to see if my site has a inbound link from a members site, if it all checks out ok an outbound link is shown on the members page on my site.

 

Its amazing the lengths people will go to to try and cheat the system and get round it, one of the reasons i have automated it.

However, one or two members are adding a no follow tag to the inbound links and i am a bit stuck on how to check for this.

 

My working code as it is now:

 

$url1= "http://{$row2['url']}";
	  $url2 = "http://{$row2['weblink']}";
	  preg_match('@^(?:http://)?([^/]+)@i',
          "$url1", $matches);
          $host = $matches[1];
	  $sampleurl1 = "{$matches[0]}";
	  preg_match('@^(?:http://)?([^/]+)@i',
          "$url2", $matches2);
          $host2 = $matches2[1];
	  $sampleurl2 = "{$matches2[0]}";
	  
	  
	  if (empty ($row2['weblink'])) {$link = "";}
	  else 
	  if ($sampleurl1 == $sampleurl2) {
	  
	  $homepage = file_get_contents("$url1");
	  $mylink = 'http://www.mysite.com';
	  $pos = strpos($homepage, $mylink); 
	  
	  if ($pos === false) {
          $link = "";
      } else {
          $link = "<p>{$row2['fname']} has a web site which can be found here <br><a title='{$row2['linktitle']}'href='http://{$row2['weblink']}'>{$row2['linktitle']}</a></p><br>";
      }
	 }

 

Would appreciate some thoughts, as most of you guys actually know what your talking about lol

Link to comment
https://forums.phpfreaks.com/topic/199664-automated-link-checker/
Share on other sites

What is used to build the "offending" sites. Lots of blogging and CMS apps automatically add the nofollow attribute to links and the user/site owner has little to no control over the addition of the attribute and many times does not even know the link contains this attribute.

 

IF, OTOH, they are manually adding the nofollow to a link back you should probably treat it as a missing link and drop their link from your site.

 

Just my 2cents on the nofollow issue.

 

As for finding the no follow in the link:

$url='mysite.com';//or www.mysite.com
preg_match("/<a(.*)href=[\"'](.*)".$url."(\/?)[\"'](.*)>(.*)<\/a>/",$file, $matches);
$clean_array = array_map('trim', $matches);
$clean_array=array_filter($clean_array);
if(in_array('rel="nofollow"',$clean_array))
{echo "nofollow found";}
else
{echo "link is clean";}
echo "<p><pre>";
print_r($clean_array);

 

 

 

HTH

Teamatomic

Thanks for the reply, most of my members sites are small self built 4-5 page sites, some of them are more proffesionaly designed, but i would hazard a guess that 99% of them are not using a cms.

 

They are basicly just trying to get round my system for a free link without having to give one back.

 

I will have a play with your suggestion and see if i can intergrate that into what i have now and put pay to thier antics!  :smoker:

 

At the very least it will give them another 6 months of head scratching and try new ways of getting around my code LOL

I am so close i can taste it LOL

 

$homepage = file_get_contents("$url1");
	  $mylink = 'http://www.mysite.com';
	  $pos = strpos($homepage, $mylink); 
	  $endlink = substr($homepage,($pos + strlen($mylink)),200); # Get HTML after URL
	  $begininglink = substr($homepage,($pos - strlen($mylink) - 100),200); # Get HTML before URL
	  $completelink = array($begininglink, $endlink);
	  if(in_array('rel="nofollow"',$completelink))
          {echo "nofollow found";}
          else
          {echo "link is clean <br>";}
          print_r($completelink);

 

Ive gone about it slightly differently from teamatomic's suggestion, only becuase i am not clued up on arrays too much.

This code is echoing out the the link and code before the link and after, i have a page that i am looking at with a nofollow tag but its still showing link is clean, when it should be saying nofollow found!

 

Someone must be able to see the fault here!

print_r your $completelink array. Unless 'rel="nofollow"' is the exact string as one of the element values it will not be "found". in_array requires an exact match. Thats why the preg into the array. You get the results if the link is found then can check for the nofollow in the array. You kill two pigs with one statement(and get a bacon sandwich to boot!)

 

just a thought; if you concatenate the beginning and end links vars and explode it into an array on spaces, then clean it up you should be able to get 'rel="nofollow"' into an element alone. Then in_array will find it.

 

 

HTH

Teamatomic

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.