Jump to content

How to replace only links which aren't in <a href=''></a> brackets


AndyPSV

Recommended Posts

function rplLnk($x,$style='') {
$x = ereg_replace('[-a-z0-9!#$%&\'*+/=?^_`{|}~]+@([.]?[a-zA-Z0-9_/-])*','<a href=\'mailto:\\0\' '.$style.'>\\0</a>',$x);
$x = ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]", "<a href='\\0\' $style>\\0</a>", $x);
$x = preg_replace(',(?<!//)www\.[^<>[:space:]]+[[:alnum:]/],i','<a href="http://\0">\0</a>',$x);
return $x;
}

 

thank you

Hey Andy,

 

For the second replacement line, we'll have to be a more specific than the [[:alpha:]]+ before the // and specify a protocol. Other than that, I assumed you're happy with the way the url is matched (one way in a million) and only added code to make sure you're not already part of a linked url. This gives us the following (to replace your second replacement line):

$x=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);

It checks that the url is not preceded by =" and not followed by </a>

 

For the third replacement line, that line already checks that the www is not preceded by //, taking care of the "not preceded by" check. Adding a check for not followed by </a>, you get:

$x=preg_replace(',(?<!//)www\.(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);

 

I checked those by running the replacement twice on sample urls to make sure they only get replaced once:

Code:

<?php
$x='www.google.com/test';
$y='<a href="http://www.google.com?q=f">http://www.google.com?q=f</a>';
for($i=1;$i<3;$i++) {
$x=preg_replace(',(?<!//)www\.(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);
echo htmlentities($x).'<br />';
}
for($i=1;$i<3;$i++) {
$y=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$y);
echo htmlentities($y).'<br />';
}
?>

 

Output:

<a href="http://www.google.com/test">www.google.com/test</a>

<a href="http://www.google.com/test">www.google.com/test</a>

<a href="http://www.google.com?q=f">http://www.google.com?q=f</a>

<a href="http://www.google.com?q=f">http://www.google.com?q=f</a>

 

Again, there are a million ways of matching urls and I am only modifying what you have.

 

Let me know if you have any questions or problems. :)

Hi again Andy,

also:

 

Just so you know, ereg_replace is deprecated.

In your code above, in the first replacement (the mailto), without looking at the regex itself, you should be able to substitute preg_replace where it says ereg_replace.

 

Wishing you a fun day

I've got a code:

 

<p><span style="color: #222222; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);">View the reply at:&nbsp;</span><a style="color: #1155cc; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);" href="http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new" target="_blank">http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new<br /><br /></a>adsdsadsa<br /><br />http://link1.com/<br /><br />www.link2.com<br /><br />&nbsp;</p>

 

I've used:

 

function rplLnk($x,$style='') {
/*
$x = ereg_replace('[-a-z0-9!#$%&\'*+/=?^_`{|}~]+@([.]?[a-zA-Z0-9_/-])*','<a href=\'mailto:\\0\' '.$style.'>\\0</a>',$x);
$x = ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]", "<a href='\\0\' $style>\\0</a>", $x);
$x = preg_replace(',(?<!//)www\.[^<>[:space:]]+[[:alnum:]/],i','<a href="http://\0">\0</a>',$x);
*/

$x = ereg_replace('[-a-z0-9!#$%&\'*+/=?^_`{|}~]+@([.]?[a-zA-Z0-9_/-])*','<a href=\'mailto:\\0\' '.$style.'>\\0</a>',$x);
$x = preg_replace(',(?<!//)www\.(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);
$x = preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);
return $x;
}

 

and the result is:

 

xxx3.gif

as you see the first one (<a href) was correctly printed)

 

but the 2 links below weren't transformed to links, so how to fix it

 

 

THANK YOU.

Actually, when I run the code, everything is transformed.

 

One small change, though.

In the second replace, I used the same replacement string as in the third (copy-paste). That's a mistake, as the url in the second replacement is already formed. We need to drop the http from the replacement string: <a href="\0">\0</a>

 

This gives us:

$x=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="\0">\0</a>',$x);

 

Now here's what happens if we run that ugly string of yours through these regexes:

Code:

<?php
$x='<p><span style="color: #222222; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);">View the reply at:&nbsp;</span><a style="color: #1155cc; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);" href="http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new" target="_blank">http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new<br /><br /></a>adsdsadsa<br /><br />http://link1.com/<br /><br />www.link2.com<br /><br />&nbsp;</p>';
$x=preg_replace(',(?<!//)www\.(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);
$x=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="\0">\0</a>',$x);
echo htmlentities($x).'<br /><br />';
?>

 

Output:

<p><span style="color: #222222; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);">View the reply at:&nbsp;</span><a style="color: #1155cc; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);" href="<a href="http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new&quot">http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new&quot</a>; target="_blank"><a href="http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new<br">http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new<br</a> /><br /></a>adsdsadsa<br /><br /><a href="http://link1.com/<br">http://link1.com/<br</a> /><br /><a href="http://www.link2.com<br">www.link2.com<br</a> /><br />&nbsp;</p>

 

Unless I've missed something, everything has been replaced.

 

Now, it's also true that some weird elements have been converted into links, e.g. the end of www.link2.com<br

 

1. This is the nature of the original regex you provided: the [^<>[:space:]]+ will eat all kinds of characters. As I mentioned, all I did was add checks to the effect that the replaced urls are not already part of existing formed links, as you requested.

 

2. I am not sure whether this needs to be fixed, because I don't know if you are really applying the regex to that ugly string, or if it was just an "escaped" version that you pasted in your last post for some reason.

 

Warmest wishes,

 

 

 

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.