Jump to content

preg_replace if not in <h1></h1> tags

Recommended Posts

In PHP, I'm trying to create a pattern that will add a <a href>  link on certain phrases if they are found within the content. The keywords are loaded in from a CSV file and stored in an array. I am then looping through the array and processing.


The rules are:


    * Only match if word is NOT within a Heading tag (h1-h6)

    * Case insensitive

    * Only replace if whole word is found, e.g. don't match 'mens cothes' in 'womens clothes'

    * The case of the replacement should be the same as the original content, NOT the case of the keyword. So if we find 'this' in 'THIS', the link text should remain 'THIS'.


I'm pretty poor at regexp but here's what I've managed to cobble together. Ok, don't laugh:


$result = preg_replace('%[^(<h1>)]\b(designer clothes)\b[^(<\/h1>)]%i','<a href="">$1</a>',$content, -1, $count); 


The above kind of works. So from the original content:


'the benefits of mens designer clothes and what it can do for them.'


We get:


'the benefits of mens<a href="">designer clothes</a>and what it can do for them.'


Why is there no space before the <a href="">? I know I could just add it to the replacement string but doesn't seem that elegant. And the above will only match h1 tags, I know. I'd be very grateful if the above can be improved upon or any suggestions or anything.


Cheers, Jon

Link to comment
Share on other sites

One way you can do it, although it's not very elegant:


$content = 'Testing a test with this! Tag: <span title="test">tag</span>. <h1>Test heading</h1>';
//replace keywords (that are not part of HTML tags)
$content = preg_replace('~\btest\b(?![^<]*?>)~i', '<a href="" UNIQUE>$0</a>', $content);
//remove created links between heading tags
function _callback($matches) {
return preg_replace('~<a href="[^"]*" UNIQUE>(.*?)</a>~s', '$1', $matches[0]);
$content = preg_replace_callback('~<h([1-6])\b[^>]*>.+?</h\1>~is', '_callback', $content);
//remove UNIQUE marks
$content = preg_replace('~(<a href="[^"]*") UNIQUE>~', '$1>', $content);
header('Content-type: text/plain; charset=utf-8');
echo $content;

Thanks for the reply thebadbad, appreciate the time. The way I found to do it was:


$content = preg_replace('%(?!<h[1-6]>)\b(designer clothes)\b(?!<\/h[1-6]>)%i','<a href=">$1</a>', $content, -1, $count);


The ?! is a lookahead apparently, which  *think* means it can be there or not.


I'm not entirely sure where that's the way to go, but it works and is only a line. Maybe I've fluked it, I don't know.


Thanks again for your post

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.