Jump to content

[SOLVED] Anchor Parsing Regex


Goldeneye

Recommended Posts

I've written this regular expression for automatically parsing text prefixed with http:// in an anchor which went relatively fine. I tested it only to realize that this conflicted with my image parsing regular expression. As you can guess, it parses the url between the tags which means instead of

 

<img src="http://foo.bar/image.gif">

I get...

<img src="<a href="http://foo.bar/image.gif">http://foo.bar/image.gif</a>">

 

Here is what I tried to resolve this conflict:

 

<?php
//This is inside a function; $textstring is a function parameter
preg_replace('{(?<!href=')((https?|ftp|gopher|irc)://|(news|mailto|aim)(&(?![gl]t;)|[-a-zA-Z0-9;/?:@=+$,_.!~*'()%#])+}', '<a href="$0">$0</a>', $textstring);

//The following method properly parsed the image, but the assumed link remained unparsed (plain-text).
if(!preg_match('/<img src="/si', $textstring) $textstring = preg_replace('/http:\/\/([^\/]+)[^\s]*/', '<a href="$0" target="_blank">$1</a>', $textstring);
?>

Link to comment
Share on other sites

I've had the same problem but with the URL bbcode tag.

 

The solution was this, for url:

 

preg_match_all('%
     (
        (?>
           ### Protocol or start.
           (?:
              (??:https?|ftp)://)
              |
              www\.
           )
           ### Body: gobble everything except [/url]
           (??!\[/url\]).)+
           ### Avoid ending punctuation.
           (?<!\p{P})
        )
        ### Not followed by an url end.
        (?!\[/url\])
     )
     %x', $str, $matches);

 

But I'm sure you could change the url to img and give it a burl.

Link to comment
Share on other sites

Hmmm I tried implementing this and it didn't seem to do anything.. Using my Text Formatting function (see below), I tested

 

[img=http://www.google.ca/images/nav_logo3.png]
http://google.com

 

But all that got outputted was 10. Did I implement the preg_match_all() correctly?

 

I also checked the PHP Manual, but it wasn't that big of a help, to be honest.

 

<?php
function formatText($str) {
	$str = stripslashes($str);
	$str = htmlentities($str, ENT_COMPAT, 'UTF-8');
	$str = nl2br($str);
	$search = array(
					'/\[b\](.*?)\[\/b\]/si', 
					'/\[i\](.*?)\[\/i\]/si', 
					'/\[u\](.*?)\[\/u\]/si', 
					'/\[align\=(left|center|right)\](.*?)\[\/align\]/si', 
					'/\[img\](.*?)\[\/img\]/si', 
					'/((mailto:|(http|ftp|nntp|news):\/\/).*?)(\s|<|\)|"|\\\\|\'|$)/'
					);
	$replace = array(
					'<b>$1</b>', 
					'<i>$1</i>', 
					'<u>$1</u>', 
					'<div style="text-align: $1;">$2</div>', 
					'<a href="$1"><img src="$1" class="image"></a>', 
					'<a href="$1" rel="nofollow" target="_blank">$1</a>$4'
					);
	$str = preg_match_all('%((?>(??:(?:https?|ftp)://)|www\.)(??!\[/img\]).)+(?<!\p{P}))(?!\[/img\]))%x', $str, $matches);
	$str = preg_replace($search, $replace, $str);
	return $str;
}
?>

Link to comment
Share on other sites

Well looking at that your mailto: will also muck up the regular expression.

 

Instead of the preg_match_all use replace:

 

$str = preg_replace('%((?>(??:(?:https?|ftp)://)|www\.)(??!\[/img\]).)+(?<!\p{P}))(?!\[/img\]))%x', "<a href='\\1'>\\1</a>", $str);

 

But I have a feeling that your mailto will cause issues.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.