worker201 Posted April 15, 2008 Share Posted April 15, 2008 Hey, I'm trying to write a preg_replace that will find and replace some acronyms (all upper-case) when they appear in text. Some simple conditions: - should match if the acronym is preceded by a space, tab or newline - should match if the acronym is followed by a space, tab, newline, -, ., ?, !, s, or y Here's what I have so far (to find MPAA, as an example): preg_replace("/[\s]MPAA[\.-!\?sy\s]/", $replacement, $subject) Doesn't work. I kinda understand some of the matching bits, but the syntax of the actual code is very confusing. Especially which characters need to be escaped, whether the surrounding slashes are necessary, and where parentheses need to be used. Any suggestions greatly appreciated. Quote Link to comment Share on other sites More sharing options...
discomatt Posted April 15, 2008 Share Posted April 15, 2008 $regex = '/(?:^|\\s)MPAA(?:[\\s\\-.\\?!sy]|$)/'; This will match if the following is true Start of the string, or a whitespace character (linebreak, space, tab ect) first. Followed by 'MPAA' (case sensitive) Followed by either a whitespace, -, ., ?, !, s, y or the end of the string. Quote Link to comment Share on other sites More sharing options...
worker201 Posted April 15, 2008 Author Share Posted April 15, 2008 Thanks for the speedy reply. Having some troubles with your solution: Complete code: $reptext = preg_replace('/(?:^|\\s)MPAA(?:[\\s\\-.\\?!sy]|$)/', '<span class="acro" title="Motion Picture Association of America">MPAA</span>', $reptext); Tested on: MPAA Output: span class="acro" title="Motion Picture Association of America">MPAA It looks like there is some kind of replication going on here. Any ideas? Quote Link to comment Share on other sites More sharing options...
discomatt Posted April 15, 2008 Share Posted April 15, 2008 These are my results. <?php $replace = ' <span class="acro" title="Motion Picture Association of America">MPAA</span> '; $regex = '/(?:^|\\s)MPAA(?:[\\s\\-.\\?!sy]|$)/'; $subject = 'MPAA'; echo preg_replace($regex, $replace, $subject); echo "<br>\n"; $subject = 'This is some text with MPAA in it!'; echo preg_replace($regex, $replace, $subject); echo "<br>\n"; $subject = 'MPAA stinks'; echo preg_replace($regex, $replace, $subject); echo "<br>\n"; $subject = 'Who are the MPAA?'; echo preg_replace($regex, $replace, $subject); echo "<br>\n"; $subject = 'MPAAy for some more testing'; echo preg_replace($regex, $replace, $subject); echo "<br>\n"; $subject = 'This shouldn\'t convertMPAA'; echo preg_replace($regex, $replace, $subject); echo "<br>\n"; $subject = 'MPAAnor should this!'; echo preg_replace($regex, $replace, $subject); echo "<br>\n"; ?> Outputs: <span class="acro" title="Motion Picture Association of America">MPAA</span> <br> This is some text with <span class="acro" title="Motion Picture Association of America">MPAA</span> in it!<br> <span class="acro" title="Motion Picture Association of America">MPAA</span> stinks<br> Who are the <span class="acro" title="Motion Picture Association of America">MPAA</span> <br> <span class="acro" title="Motion Picture Association of America">MPAA</span> for some more testing<br> This shouldn't convertMPAA<br> MPAAnor should this!<br> Seems to be working. Did you not want it to act like this? Quote Link to comment Share on other sites More sharing options...
worker201 Posted April 15, 2008 Author Share Posted April 15, 2008 Actually, no. If it has a -,.,?,!,y,s or whitespace character after it, I want to add the span tags only to the MPAA part, while preserving everything else. Using your test cases: Who are the <span class="acro" title="Motion Picture Association of America">MPAA</span>? <span class="acro" title="Motion Picture Association of America">MPAA</span>y for some more testing<br> So the 'y' and the '?' get preserved in this case. Otherwise, seems to be working. I'll have to figure out why mine isn't working. I tested your code, and it works, now I have to figure out why it doesn't seem to integrate with my code. Quote Link to comment Share on other sites More sharing options...
discomatt Posted April 15, 2008 Share Posted April 15, 2008 Ah, simple enough! <?php $replace = '\1<span class="acro" title="Motion Picture Association of America">MPAA</span>\2'; $regex = '/(^|\\s)MPAA([\\s\\-.\\?!sy]|$)/'; $subject = 'Who are the MPAA?'; echo preg_replace($regex, $replace, $subject); ?> Quote Link to comment Share on other sites More sharing options...
worker201 Posted April 15, 2008 Author Share Posted April 15, 2008 Can you please explain the 1 and 2 in your replace declaration? The description of how the counter(?) works in the PHP manual was less than helpful. Quote Link to comment Share on other sites More sharing options...
discomatt Posted April 16, 2008 Share Posted April 16, 2008 No problem, Regex can be very cryptic at first. I personally recommend Regexbuddy ( http://www.regexbuddy.com/ ) if you use regex on a regular basis or are just curious. It does a great job of explaining everything. A lone set of bracket in regex creates a capturing group. This allows you to reference the values in those brackets at a later time (ie for replacement) Here's the raw regex: (^|\s)MPAA([\s\-.\?!sy]|$) So, I want to capture the start of the string, or a whitespace character in the first capturing group Then I want to match MPAA (no brackets, nothing is captured Then I want to capture one of the characters in the list or the end of the string. The first capturing group is referenced using \1, the second \2, and if there happened to be a third... fourth... \3, \4 ect. In most flavors of regex you can create named capturing groups as well, using (?P<foo>[A-z]) and referenced using either \1 or (P=foo). This is handy for very complex regex statements. You can also create non-capturing groups (saves memory if you don't want to use the data in the brackets later) buy using (?:[A-z]) Hope this helps clear some curiosities. If you're interested, check out the demo for regexbuddy... I warn you though, you'll probably get addicted to it http://www.regexbuddy.com/cgi-bin/SetupRegexBuddyDemo.exe Assuming you're running windows Quote Link to comment Share on other sites More sharing options...
worker201 Posted April 16, 2008 Author Share Posted April 16, 2008 What's Windows? Is that some new virus? Thanks for all your help. Everything is working great now. Quote Link to comment Share on other sites More sharing options...
discomatt Posted April 17, 2008 Share Posted April 17, 2008 Windows is a great OS.. some people just can't use it ... Now as far as a server environment goes... I'd say it's more like a plague Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.