iLuke Posted July 11, 2011 Share Posted July 11, 2011 Hi guys! Got a bit of a challenge here - I'm trying to do something with regular expressions that sounds fairly easy: I want to root through a big chunk of text, find all the links, and add a target='_blank' to any links where the domain isn't the same as my site. I'm basically running this function alongside a blog, so that the user never has to worry about manually adding a link target - all external links open in new tabs. The code I've come up with so far is as follows (it's running through OOP, but that shouldn't matter I don't think): This isn't working at all - but it's not giving any errors either, and I've been using an online regular expression checker to make sure there's no errors returned. $str = preg_replace( '/\<a href="((?:http:\/\/)?(?:www\.)?(?![b]MYDOMAIN[/b]).*)"\>([a-zA-Z0-9-_#\.]*)\<\/a\>/i', '<a href="$1" target="_blank">$2</a>', $str ); I've already learned a whole load trying to get this working - negative lookahead assertions, backreferencing, ignoring variables using the ?: notation etc. I never knew regex could get so confusing! I just wondered if anyone out there might know how this could be done? The output that the function will search is raw HTML, and will have had addSlashes applied for security, but I can easily reverse this if that will cause an issue. Thanks so much in advance! I hope this isn't too tricky for someone who knows what they're doing, but I'm stumped! - Luke Quote Link to comment Share on other sites More sharing options...
.josh Posted July 11, 2011 Share Posted July 11, 2011 here's my take...not really perfect but it's a good starter... <?php // example links $content = <<<EOF <a href='http://www.somedomain.com'>a</a> <a href='http://www.mydomain.com'>b</a> <a href='somepage.php'>c</a> <a id='blah' href='http://www.somedomain.com'>d</a> <a href='http://www.somedomain.com' id='blah'>e</a> <a href='http://www.mydomain.com' id='blah'>f</a> EOF; function checkDomain ($matches) { $url = parse_url(strtolower($matches[4])); $target = (!$url['host'] || ($url['host']=='www.mydomain.com')) ? "" : " target='_blank'"; unset($matches[0],$matches[3]); $matches = implode('',$matches); return "<a".$target.$matches.">"; } echo "<pre>".htmlspecialchars($content)."</pre>"; // before $content = preg_replace_callback('~<a([^>]*)(href\s?=\s?([\'"]))((??!\3).)*)(\3)([^>]*)>~i','checkDomain',$content); echo "<pre>".htmlspecialchars($content)."</pre>"; // after ?> Before: <a href='http://www.somedomain.com'>a</a> <a href='http://www.mydomain.com'>b</a> <a href='somepage.php'>c</a> <a id='blah' href='http://www.somedomain.com'>d</a> <a href='http://www.somedomain.com' id='blah'>e</a> <a href='http://www.mydomain.com' id='blah'>f</a> After: <a target='_blank' href='http://www.somedomain.com'>a</a> <a href='http://www.mydomain.com'>b</a> <a href='somepage.php'>c</a> <a target='_blank' id='blah' href='http://www.somedomain.com'>d</a> <a target='_blank' href='http://www.somedomain.com' id='blah'>e</a> <a href='http://www.mydomain.com' id='blah'>f</a> At least one improvement that can be made...this doesn't really account for if the target attribute already exists. So additional code could be written to check if it is there and then either overwrite it or let it be the overrider. Also, IMO this would probably be a lot easier to do client-side if you are using a framework like jQuery. Quote Link to comment Share on other sites More sharing options...
iLuke Posted July 11, 2011 Author Share Posted July 11, 2011 Ah excellent stuff - yours is a more efficient way of going about in terms of logic even if the code is a bit heftier - I'd not thought of running some code against the text outside of regular expressions to be honest hah! I am using jQuery on this as well client-side when the code is written, but this is just intended as a catch-net for if that fails (it failed a few times so I was prompted to try this method). I'm surprised there's no one who has tried to do this previously (or at least not that I've found to be easily searchable) - it seems like a fairly common thing to want to do. Anyhow, thank you very much for your time mate - much appreciated. I'm still fairly new to regular expressions and I'm still in awe at how powerful they are (and how damned complex at times!). Thanks again, Luke. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.