Jump to content

How to replace only links which aren't in <a href=''></a> brackets


AndyPSV

Recommended Posts

function rplLnk($x,$style='') {
$x = ereg_replace('[-a-z0-9!#$%&\'*+/=?^_`{|}~]+@([.]?[a-zA-Z0-9_/-])*','<a href=\'mailto:\\0\' '.$style.'>\\0</a>',$x);
$x = ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]", "<a href='\\0\' $style>\\0</a>", $x);
$x = preg_replace(',(?<!//)www\.[^<>[:space:]]+[[:alnum:]/],i','<a href="http://\0">\0</a>',$x);
return $x;
}

 

thank you

Link to comment
Share on other sites

Hey Andy,

 

For the second replacement line, we'll have to be a more specific than the [[:alpha:]]+ before the // and specify a protocol. Other than that, I assumed you're happy with the way the url is matched (one way in a million) and only added code to make sure you're not already part of a linked url. This gives us the following (to replace your second replacement line):

$x=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);

It checks that the url is not preceded by =" and not followed by </a>

 

For the third replacement line, that line already checks that the www is not preceded by //, taking care of the "not preceded by" check. Adding a check for not followed by </a>, you get:

$x=preg_replace(',(?<!//)www\.(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);

 

I checked those by running the replacement twice on sample urls to make sure they only get replaced once:

Code:

<?php
$x='www.google.com/test';
$y='<a href="http://www.google.com?q=f">http://www.google.com?q=f</a>';
for($i=1;$i<3;$i++) {
$x=preg_replace(',(?<!//)www\.(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);
echo htmlentities($x).'<br />';
}
for($i=1;$i<3;$i++) {
$y=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$y);
echo htmlentities($y).'<br />';
}
?>

 

Output:

<a href="http://www.google.com/test">www.google.com/test</a>

<a href="http://www.google.com/test">www.google.com/test</a>

<a href="http://www.google.com?q=f">http://www.google.com?q=f</a>

<a href="http://www.google.com?q=f">http://www.google.com?q=f</a>

 

Again, there are a million ways of matching urls and I am only modifying what you have.

 

Let me know if you have any questions or problems. :)

Link to comment
Share on other sites

Hi again Andy,

also:

 

Just so you know, ereg_replace is deprecated.

In your code above, in the first replacement (the mailto), without looking at the regex itself, you should be able to substitute preg_replace where it says ereg_replace.

 

Wishing you a fun day

Link to comment
Share on other sites

I've got a code:

 

<p><span style="color: #222222; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);">View the reply at:&nbsp;</span><a style="color: #1155cc; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);" href="http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new" target="_blank">http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new<br /><br /></a>adsdsadsa<br /><br />http://link1.com/<br /><br />www.link2.com<br /><br />&nbsp;</p>

 

I've used:

 

function rplLnk($x,$style='') {
/*
$x = ereg_replace('[-a-z0-9!#$%&\'*+/=?^_`{|}~]+@([.]?[a-zA-Z0-9_/-])*','<a href=\'mailto:\\0\' '.$style.'>\\0</a>',$x);
$x = ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]", "<a href='\\0\' $style>\\0</a>", $x);
$x = preg_replace(',(?<!//)www\.[^<>[:space:]]+[[:alnum:]/],i','<a href="http://\0">\0</a>',$x);
*/

$x = ereg_replace('[-a-z0-9!#$%&\'*+/=?^_`{|}~]+@([.]?[a-zA-Z0-9_/-])*','<a href=\'mailto:\\0\' '.$style.'>\\0</a>',$x);
$x = preg_replace(',(?<!//)www\.(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);
$x = preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);
return $x;
}

 

and the result is:

 

xxx3.gif

as you see the first one (<a href) was correctly printed)

 

but the 2 links below weren't transformed to links, so how to fix it

 

 

THANK YOU.

Link to comment
Share on other sites

Actually, when I run the code, everything is transformed.

 

One small change, though.

In the second replace, I used the same replacement string as in the third (copy-paste). That's a mistake, as the url in the second replacement is already formed. We need to drop the http from the replacement string: <a href="\0">\0</a>

 

This gives us:

$x=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="\0">\0</a>',$x);

 

Now here's what happens if we run that ugly string of yours through these regexes:

Code:

<?php
$x='<p><span style="color: #222222; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);">View the reply at:&nbsp;</span><a style="color: #1155cc; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);" href="http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new" target="_blank">http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new<br /><br /></a>adsdsadsa<br /><br />http://link1.com/<br /><br />www.link2.com<br /><br />&nbsp;</p>';
$x=preg_replace(',(?<!//)www\.(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="http://\0">\0</a>',$x);
$x=preg_replace(',(?<!=")(?:http|ftp|file)://(?>[^<>[:space:]]+[[:alnum:]/])(?!</a),i','<a href="\0">\0</a>',$x);
echo htmlentities($x).'<br /><br />';
?>

 

Output:

<p><span style="color: #222222; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);">View the reply at:&nbsp;</span><a style="color: #1155cc; font-family: arial, sans-serif; font-size: 13px; line-height: normal; background-color: rgba(255, 255, 255, 0.917969);" href="<a href="http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new&quot">http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new&quot</a>; target="_blank"><a href="http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new<br">http://www.phpfreaks.com/forums/index.php?topic=354157.new;topicseen#new<br</a> /><br /></a>adsdsadsa<br /><br /><a href="http://link1.com/<br">http://link1.com/<br</a> /><br /><a href="http://www.link2.com<br">www.link2.com<br</a> /><br />&nbsp;</p>

 

Unless I've missed something, everything has been replaced.

 

Now, it's also true that some weird elements have been converted into links, e.g. the end of www.link2.com<br

 

1. This is the nature of the original regex you provided: the [^<>[:space:]]+ will eat all kinds of characters. As I mentioned, all I did was add checks to the effect that the replaced urls are not already part of existing formed links, as you requested.

 

2. I am not sure whether this needs to be fixed, because I don't know if you are really applying the regex to that ugly string, or if it was just an "escaped" version that you pasted in your last post for some reason.

 

Warmest wishes,

 

 

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.