Jump to content

[SOLVED] Anchor Parsing Regex


Goldeneye

Recommended Posts

I've written this regular expression for automatically parsing text prefixed with http:// in an anchor which went relatively fine. I tested it only to realize that this conflicted with my image parsing regular expression. As you can guess, it parses the url between the tags which means instead of

 

<img src="http://foo.bar/image.gif">

I get...

<img src="<a href="http://foo.bar/image.gif">http://foo.bar/image.gif</a>">

 

Here is what I tried to resolve this conflict:

 

<?php
//This is inside a function; $textstring is a function parameter
preg_replace('{(?<!href=')((https?|ftp|gopher|irc)://|(news|mailto|aim)(&(?![gl]t;)|[-a-zA-Z0-9;/?:@=+$,_.!~*'()%#])+}', '<a href="$0">$0</a>', $textstring);

//The following method properly parsed the image, but the assumed link remained unparsed (plain-text).
if(!preg_match('/<img src="/si', $textstring) $textstring = preg_replace('/http:\/\/([^\/]+)[^\s]*/', '<a href="$0" target="_blank">$1</a>', $textstring);
?>

Link to comment
https://forums.phpfreaks.com/topic/123531-solved-anchor-parsing-regex/
Share on other sites

I've had the same problem but with the URL bbcode tag.

 

The solution was this, for url:

 

preg_match_all('%
     (
        (?>
           ### Protocol or start.
           (?:
              (??:https?|ftp)://)
              |
              www\.
           )
           ### Body: gobble everything except [/url]
           (??!\[/url\]).)+
           ### Avoid ending punctuation.
           (?<!\p{P})
        )
        ### Not followed by an url end.
        (?!\[/url\])
     )
     %x', $str, $matches);

 

But I'm sure you could change the url to img and give it a burl.

Hmmm I tried implementing this and it didn't seem to do anything.. Using my Text Formatting function (see below), I tested

 

[img=http://www.google.ca/images/nav_logo3.png]
http://google.com

 

But all that got outputted was 10. Did I implement the preg_match_all() correctly?

 

I also checked the PHP Manual, but it wasn't that big of a help, to be honest.

 

<?php
function formatText($str) {
	$str = stripslashes($str);
	$str = htmlentities($str, ENT_COMPAT, 'UTF-8');
	$str = nl2br($str);
	$search = array(
					'/\[b\](.*?)\[\/b\]/si', 
					'/\[i\](.*?)\[\/i\]/si', 
					'/\[u\](.*?)\[\/u\]/si', 
					'/\[align\=(left|center|right)\](.*?)\[\/align\]/si', 
					'/\[img\](.*?)\[\/img\]/si', 
					'/((mailto:|(http|ftp|nntp|news):\/\/).*?)(\s|<|\)|"|\\\\|\'|$)/'
					);
	$replace = array(
					'<b>$1</b>', 
					'<i>$1</i>', 
					'<u>$1</u>', 
					'<div style="text-align: $1;">$2</div>', 
					'<a href="$1"><img src="$1" class="image"></a>', 
					'<a href="$1" rel="nofollow" target="_blank">$1</a>$4'
					);
	$str = preg_match_all('%((?>(??:(?:https?|ftp)://)|www\.)(??!\[/img\]).)+(?<!\p{P}))(?!\[/img\]))%x', $str, $matches);
	$str = preg_replace($search, $replace, $str);
	return $str;
}
?>

Well looking at that your mailto: will also muck up the regular expression.

 

Instead of the preg_match_all use replace:

 

$str = preg_replace('%((?>(??:(?:https?|ftp)://)|www\.)(??!\[/img\]).)+(?<!\p{P}))(?!\[/img\]))%x', "<a href='\\1'>\\1</a>", $str);

 

But I have a feeling that your mailto will cause issues.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.