Jump to content

[SOLVED] Complex regex search - would like some help please


Recommended Posts

I'm not particularly good at regular expressions, and this is probably the most complex one I've tried.  Basically, my company is doing some updates for a very large website, and one of the things they'd like us to do is bold a bunch of keywords in the text throughout the site.  So I figured hey, a regex will make that easy... :)

 

So far my regex filters out keywords contained within html and php tags, like it should.  (Wouldn't it be fun to have your php scripts mess up because one of the keywords for the site also happens to be a variable, and you've now stuck tags around it?  Or have your links stop working because those same keywords are also part of file names?)  So here is what I have so far.  (Note: using EditPadPro - JGSoft regex engine.)

 

(?<= (?<! \? | TITLE | (?: href=[^>]*) | [0-9] ) > [^<]*)  \b((?: keyword | list | here) '? s?)\b

*note: added spaces and colors for readibility

 

Alright, so it currently matches any keyword preceded by a > but not a < (filters out ones in the middle of html tags) where the > is not preceded by title, ? (for php tags), a number (for headers), or a hyperlink.  But it does match one case when I don't want it to:

 

<a href="url">some stuff <br>keyword <h2>some stuff</h2></a>

 

If there is another html tag between the <a> and the keyword, the regex matches.  I tried using a lookahead with a negated character class (tried [^<] and [^/]), which works as long as the keyword isn't followed by another ending html tag... like the </h2> in the example above.  Of course, using .* instead of the negated character class means that any keywords preceding a link won't be caught, even if they're not within the link itself.

 

So I'm lost.  I also tried filtering out any > preceded by a ", but then any keywords following an image wouldn't be caught, so that doesn't work for me either.  Any help would be greatly appreciated - this is all a little over my head.

Thanks, but I read that one already.  That would be great if I was allowed to install & run PHP on my computer, but the only regex tool I have is EditPadPro's search and replace - which is usually plenty, luckily I don't have to do this often.  But I really don't want to manually bold keywords in 300+ files...

 

I suppose I could use what I've got and then use a second regex to strip out the unwanted bold tags, but it's a lot of files and I'm afraid it's already going to take a lot of time to run through them once.  Hmm.  Maybe use an if-else?  Could I use that to say 'if you find <a, keep going until you find a>, otherwise do what i have now'?

 

Think I'll try that.

No, I'm on a PC.  I can upload files to our web server, which does run PHP, but I don't think I can run scripts on it outside of web pages.  I probably could put it in a web page and run it there, but I'm just an intern here and I'm not sure my boss would like that.  I've almost got this working with the if-else construct anyway... I'll post it when I figure it out.

Well, I decided that it's not a problem if the keywords within links are bolded as well, so unless the client complains, I'm all set.  That regex worked perfectly; bolded all the keywords (and permutations of them).  665 matches in 278 files...

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.