miseleigh Posted August 30, 2007 Share Posted August 30, 2007 I'm not particularly good at regular expressions, and this is probably the most complex one I've tried. Basically, my company is doing some updates for a very large website, and one of the things they'd like us to do is bold a bunch of keywords in the text throughout the site. So I figured hey, a regex will make that easy... So far my regex filters out keywords contained within html and php tags, like it should. (Wouldn't it be fun to have your php scripts mess up because one of the keywords for the site also happens to be a variable, and you've now stuck tags around it? Or have your links stop working because those same keywords are also part of file names?) So here is what I have so far. (Note: using EditPadPro - JGSoft regex engine.) (?<= (?<! \? | TITLE | (?: href=[^>]*) | [0-9] ) > [^<]*) \b((?: keyword | list | here) '? s?)\b *note: added spaces and colors for readibility Alright, so it currently matches any keyword preceded by a > but not a < (filters out ones in the middle of html tags) where the > is not preceded by title, ? (for php tags), a number (for headers), or a hyperlink. But it does match one case when I don't want it to: <a href="url">some stuff <br>keyword <h2>some stuff</h2></a> If there is another html tag between the <a> and the keyword, the regex matches. I tried using a lookahead with a negated character class (tried [^<] and [^/]), which works as long as the keyword isn't followed by another ending html tag... like the </h2> in the example above. Of course, using .* instead of the negated character class means that any keywords preceding a link won't be caught, even if they're not within the link itself. So I'm lost. I also tried filtering out any > preceded by a ", but then any keywords following an image wouldn't be caught, so that doesn't work for me either. Any help would be greatly appreciated - this is all a little over my head. Quote Link to comment Share on other sites More sharing options...
miseleigh Posted August 30, 2007 Author Share Posted August 30, 2007 Quick change, doesn't change functionality (?<= (?<! \? | TITLE | <a[^>]* | \d ) > [^<]*) \b((?: keyword | list | here) '? s?)\b *note: added spaces and colors for readibility Quote Link to comment Share on other sites More sharing options...
effigy Posted August 30, 2007 Share Posted August 30, 2007 How about isolating the unwanted content altogether? See this topic. Quote Link to comment Share on other sites More sharing options...
miseleigh Posted August 30, 2007 Author Share Posted August 30, 2007 Thanks, but I read that one already. That would be great if I was allowed to install & run PHP on my computer, but the only regex tool I have is EditPadPro's search and replace - which is usually plenty, luckily I don't have to do this often. But I really don't want to manually bold keywords in 300+ files... I suppose I could use what I've got and then use a second regex to strip out the unwanted bold tags, but it's a lot of files and I'm afraid it's already going to take a lot of time to run through them once. Hmm. Maybe use an if-else? Could I use that to say 'if you find <a, keep going until you find a>, otherwise do what i have now'? Think I'll try that. Quote Link to comment Share on other sites More sharing options...
effigy Posted August 30, 2007 Share Posted August 30, 2007 The only tool available to you is EditPadPro? What about Perl? Do you have access to a Unix server? Quote Link to comment Share on other sites More sharing options...
miseleigh Posted August 30, 2007 Author Share Posted August 30, 2007 No, I'm on a PC. I can upload files to our web server, which does run PHP, but I don't think I can run scripts on it outside of web pages. I probably could put it in a web page and run it there, but I'm just an intern here and I'm not sure my boss would like that. I've almost got this working with the if-else construct anyway... I'll post it when I figure it out. Quote Link to comment Share on other sites More sharing options...
miseleigh Posted August 31, 2007 Author Share Posted August 31, 2007 Well, I decided that it's not a problem if the keywords within links are bolded as well, so unless the client complains, I'm all set. That regex worked perfectly; bolded all the keywords (and permutations of them). 665 matches in 278 files... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.