Jump to content

Automated target replacement regex help


iLuke

Recommended Posts

Hi guys!

 

Got a bit of a challenge here - I'm trying to do something with regular expressions that sounds fairly easy:

 

I want to root through a big chunk of text, find all the links, and add a target='_blank' to any links where the domain isn't the same as my site. I'm basically running this function alongside a blog, so that the user never has to worry about manually adding a link target - all external links open in new tabs.

 

The code I've come up with so far is as follows (it's running through OOP, but that shouldn't matter I don't think):

This isn't working at all - but it's not giving any errors either, and I've been using an online regular expression checker to make sure there's no errors returned.

$str = preg_replace(
		'/\<a href="((?:http:\/\/)?(?:www\.)?(?![b]MYDOMAIN[/b]).*)"\>([a-zA-Z0-9-_#\.]*)\<\/a\>/i',
		'<a href="$1" target="_blank">$2</a>',
		$str
	);

 

I've already learned a whole load trying to get this working - negative lookahead assertions, backreferencing, ignoring variables using the ?: notation etc. I never knew regex could get so confusing!

 

I just wondered if anyone out there might know how this could be done? The output that the function will search is raw HTML, and will have had addSlashes applied for security, but I can easily reverse this if that will cause an issue.

 

Thanks so much in advance! I hope this isn't too tricky for someone who knows what they're doing, but I'm stumped!

- Luke

Link to comment
Share on other sites

here's my take...not really perfect but it's a good starter...

 

<?php

// example links
$content = <<<EOF
<a href='http://www.somedomain.com'>a</a>
<a href='http://www.mydomain.com'>b</a>
<a href='somepage.php'>c</a>
<a id='blah' href='http://www.somedomain.com'>d</a>
<a href='http://www.somedomain.com' id='blah'>e</a>
<a href='http://www.mydomain.com' id='blah'>f</a>
EOF;

function checkDomain ($matches) {
  $url = parse_url(strtolower($matches[4]));
  $target = (!$url['host'] || ($url['host']=='www.mydomain.com')) ? "" : " target='_blank'";
  unset($matches[0],$matches[3]);
  $matches = implode('',$matches);
  return "<a".$target.$matches.">";
}

echo "<pre>".htmlspecialchars($content)."</pre>"; // before

$content = preg_replace_callback('~<a([^>]*)(href\s?=\s?([\'"]))((??!\3).)*)(\3)([^>]*)>~i','checkDomain',$content);

echo "<pre>".htmlspecialchars($content)."</pre>"; // after

?>

 

Before:

<a href='http://www.somedomain.com'>a</a>

<a href='http://www.mydomain.com'>b</a>

<a href='somepage.php'>c</a>

<a id='blah' href='http://www.somedomain.com'>d</a>

<a href='http://www.somedomain.com' id='blah'>e</a>

<a href='http://www.mydomain.com' id='blah'>f</a>

 

After:

<a target='_blank' href='http://www.somedomain.com'>a</a>

<a href='http://www.mydomain.com'>b</a>

<a href='somepage.php'>c</a>

<a target='_blank' id='blah' href='http://www.somedomain.com'>d</a>

<a target='_blank' href='http://www.somedomain.com' id='blah'>e</a>

<a href='http://www.mydomain.com' id='blah'>f</a>

 

At least one improvement that can be made...this doesn't really account for if the target attribute already exists.  So additional code could be written to check if it is there and then either overwrite it or let it be the overrider. 

 

Also, IMO this would probably be a lot easier to do client-side if you are using a framework like jQuery.

Link to comment
Share on other sites

Ah excellent stuff - yours is a more efficient way of going about in terms of logic even if the code is a bit heftier - I'd not thought of running some code against the text outside of regular expressions to be honest hah!

 

I am using jQuery on this as well client-side when the code is written, but this is just intended as a catch-net for if that fails (it failed a few times so I was prompted to try this method).

 

I'm surprised there's no one who has tried to do this previously (or at least not that I've found to be easily searchable) - it seems like a fairly common thing to want to do.

 

Anyhow, thank you very much for your time mate - much appreciated. I'm still fairly new to regular expressions and I'm still in awe at how powerful they are (and how damned complex at times!).

 

Thanks again,

Luke.

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.