Jump to content

preg replacing a word in a string, but with exceptions: I need an alternative to a negative variable length lookbehind


Recommended Posts

Hi

It seems I have finally arrived at a problem requiring regular expressions (never needed them before). It tried my best, but I could use some help.

 

Simplified problem: I need to replace a word in a text, but with exceptions. If the word is between two specific (html) tags, I must not be replaced. Example: In the sentence below I need to make all words 'sun' bold, except the second one because it is somewhere between italics.:

 

"The word sun must be bold. <i>But not this instance of sun, because it is between italics</i>. This sun though, should be bold again."

 

So I went like this:

 

$triggerword="sun";

$replace_with = "<b>sun</b>";

$string = "The sun must be bold. <i>But not this instance of sun because it is between italics</i>. This sun though, should be bold again.";

 

This will make all instances of sun bold:

preg_replace("/$triggerword/", $replace_with, $string);

 

I figured I needed negative lookahead to spot the closing </i> tag:

preg_replace("/$triggerword(?!.*<\/i>)/i", $replace_with, $string);

 

That works. The first and second SUN are no longer bold as expected.

 

Now a negative lookbehind to spot the opening <i> tag:

preg_replace("/(?<!<i>.*)$triggerword(?!.*<\/i>)/i", $replace_with, $string);

 

But that doesn't seem to work, apparantly because negative lookbehind may not be variable length. But I cannot use fixed length because the number of characters following the <i> tag ARE variable.

 

So now I am kind of stuck. Any ideas? Part of me thinks I am missing the easy way out. I am also interested in a solution using a completely different method.  Thanks in advance.

Are the instances of 'sun' that you don't want replaces ALWAYS within italic tags or are there other tags that should cause the replacements to be ignored?

 

Also, your current regex would only replace lower-case instances. You can create it so it will match any case of the word and replace it with a matching case.

Thank you for your help.

 

"Are the instances of 'sun' that you don't want replaces ALWAYS within italic tags or are there other tags that should cause the replacements to be ignored?"

Yes, it is conceivable that there will be more tags that should cause the replacement to be ignored. So a solution that will take into account multiple 'ignore' tags would be even more perfect. Or a reversed approach: I could embed all text in which replacement IS allowed, between two specifiec tags (like SPAN), but that will still leave the variable length issue for the lookbehind.

However, an ignored replacement by two specifiec tags will help me a lot already.  Any suggesions?

 

"Also, your current regex would only replace lower-case instances. You can create it so it will match any case of the word and replace it with a matching case."

Thanks I know. Will take this into acount.

Edited by wittenberg
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.