[SOLVED] Anchor Parsing Regex

Goldeneye · September 9, 2008

I've written this regular expression for automatically parsing text prefixed with http:// in an anchor which went relatively fine. I tested it only to realize that this conflicted with my image parsing regular expression. As you can guess, it parses the url between the tags which means instead of

<img src="http://foo.bar/image.gif">

I get...

<img src="<a href="http://foo.bar/image.gif">http://foo.bar/image.gif</a>">

Here is what I tried to resolve this conflict:

<?php
//This is inside a function; $textstring is a function parameter
preg_replace('{(?<!href=')((https?|ftp|gopher|irc)://|(news|mailto|aim)(&(?![gl]t;)|[-a-zA-Z0-9;/?:@=+$,_.!~*'()%#])+}', '<a href="$0">$0</a>', $textstring);

//The following method properly parsed the image, but the assumed link remained unparsed (plain-text).
if(!preg_match('/<img src="/si', $textstring) $textstring = preg_replace('/http:\/\/([^\/]+)[^\s]*/', '<a href="$0" target="_blank">$1</a>', $textstring);
?>

JasonLewis · September 10, 2008

I've had the same problem but with the URL bbcode tag.

The solution was this, for url:

preg_match_all('%
     (
        (?>
           ### Protocol or start.
           (?:
              (??:https?|ftp)://)
              |
              www\.
           )
           ### Body: gobble everything except [/url]
           (??!\[/url\]).)+
           ### Avoid ending punctuation.
           (?<!\p{P})
        )
        ### Not followed by an url end.
        (?!\[/url\])
     )
     %x', $str, $matches);

But I'm sure you could change the url to img and give it a burl.

Goldeneye · September 14, 2008

Hmmm I tried implementing this and it didn't seem to do anything.. Using my Text Formatting function (see below), I tested

[img=http://www.google.ca/images/nav_logo3.png]
http://google.com

But all that got outputted was 10. Did I implement the preg_match_all() correctly?

I also checked the PHP Manual, but it wasn't that big of a help, to be honest.

<?php
function formatText($str) {
	$str = stripslashes($str);
	$str = htmlentities($str, ENT_COMPAT, 'UTF-8');
	$str = nl2br($str);
	$search = array(
					'/\[b\](.*?)\[\/b\]/si', 
					'/\[i\](.*?)\[\/i\]/si', 
					'/\[u\](.*?)\[\/u\]/si', 
					'/\[align\=(left|center|right)\](.*?)\[\/align\]/si', 
					'/\[img\](.*?)\[\/img\]/si', 
					'/((mailto:|(http|ftp|nntp|news):\/\/).*?)(\s|<|\)|"|\\\\|\'|$)/'
					);
	$replace = array(
					'<b>$1</b>', 
					'<i>$1</i>', 
					'<u>$1</u>', 
					'<div style="text-align: $1;">$2</div>', 
					'<a href="$1"><img src="$1" class="image"></a>', 
					'<a href="$1" rel="nofollow" target="_blank">$1</a>$4'
					);
	$str = preg_match_all('%((?>(??:(?:https?|ftp)://)|www\.)(??!\[/img\]).)+(?<!\p{P}))(?!\[/img\]))%x', $str, $matches);
	$str = preg_replace($search, $replace, $str);
	return $str;
}
?>

JasonLewis · September 14, 2008

Well looking at that your mailto: will also muck up the regular expression.

Instead of the preg_match_all use replace:

$str = preg_replace('%((?>(??:(?:https?|ftp)://)|www\.)(??!\[/img\]).)+(?<!\p{P}))(?!\[/img\]))%x', "<a href='\\1'>\\1</a>", $str);

But I have a feeling that your mailto will cause issues.

Goldeneye · September 14, 2008

Thanks a lot ProjectFear, it worked. Also about the mailto, I'll most likely remove that just because it's not often used. This gives yet another reason to remove it. Again, thanks!

Sign In

[SOLVED] Anchor Parsing Regex

Recommended Posts

Goldeneye

Link to comment

Share on other sites

JasonLewis

Link to comment

Share on other sites

Goldeneye

Link to comment

Share on other sites

JasonLewis

Link to comment

Share on other sites

Goldeneye

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information