jibster Posted March 1, 2010 Share Posted March 1, 2010 In PHP, I'm trying to create a pattern that will add a <a href> link on certain phrases if they are found within the content. The keywords are loaded in from a CSV file and stored in an array. I am then looping through the array and processing. The rules are: * Only match if word is NOT within a Heading tag (h1-h6) * Case insensitive * Only replace if whole word is found, e.g. don't match 'mens cothes' in 'womens clothes' * The case of the replacement should be the same as the original content, NOT the case of the keyword. So if we find 'this' in 'THIS', the link text should remain 'THIS'. I'm pretty poor at regexp but here's what I've managed to cobble together. Ok, don't laugh: $result = preg_replace('%[^(<h1>)]\b(designer clothes)\b[^(<\/h1>)]%i','<a href="">$1</a>',$content, -1, $count); The above kind of works. So from the original content: 'the benefits of mens designer clothes and what it can do for them.' We get: 'the benefits of mens<a href="">designer clothes</a>and what it can do for them.' Why is there no space before the <a href="">? I know I could just add it to the replacement string but doesn't seem that elegant. And the above will only match h1 tags, I know. I'd be very grateful if the above can be improved upon or any suggestions or anything. Cheers, Jon Quote Link to comment Share on other sites More sharing options...
thebadbad Posted March 2, 2010 Share Posted March 2, 2010 One way you can do it, although it's not very elegant: $content = 'Testing a test with this! Tag: <span title="test">tag</span>. <h1>Test heading</h1>'; //replace keywords (that are not part of HTML tags) $content = preg_replace('~\btest\b(?![^<]*?>)~i', '<a href="" UNIQUE>$0</a>', $content); //remove created links between heading tags function _callback($matches) { return preg_replace('~<a href="[^"]*" UNIQUE>(.*?)</a>~s', '$1', $matches[0]); } $content = preg_replace_callback('~<h([1-6])\b[^>]*>.+?</h\1>~is', '_callback', $content); //remove UNIQUE marks $content = preg_replace('~(<a href="[^"]*") UNIQUE>~', '$1>', $content); header('Content-type: text/plain; charset=utf-8'); echo $content; Quote Link to comment Share on other sites More sharing options...
jibster Posted March 8, 2010 Author Share Posted March 8, 2010 Thanks for the reply thebadbad, appreciate the time. The way I found to do it was: $content = preg_replace('%(?!<h[1-6]>)\b(designer clothes)\b(?!<\/h[1-6]>)%i','<a href=">$1</a>', $content, -1, $count); The ?! is a lookahead apparently, which *think* means it can be there or not. I'm not entirely sure where that's the way to go, but it works and is only a line. Maybe I've fluked it, I don't know. Thanks again for your post Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.