Jump to content

[SOLVED] Finding simple acronyms


worker201

Recommended Posts

Hey,

I'm trying to write a preg_replace that will find and replace some acronyms (all upper-case) when they appear in text.  Some simple conditions:

- should match if the acronym is preceded by a space, tab or newline

- should match if the acronym is followed by a space, tab, newline, -, ., ?, !, s, or y

 

Here's what I have so far (to find MPAA, as an example):

 preg_replace("/[\s]MPAA[\.-!\?sy\s]/", $replacement, $subject)

Doesn't work.  I kinda understand some of the matching bits, but the syntax of the actual code is very confusing.  Especially which characters need to be escaped, whether the surrounding slashes are necessary, and where parentheses need to be used.

 

Any suggestions greatly appreciated.

 

Link to comment
Share on other sites

$regex = '/(?:^|\\s)MPAA(?:[\\s\\-.\\?!sy]|$)/';

 

This will match if the following is true

 

Start of the string, or a whitespace character (linebreak, space, tab ect) first.

Followed by 'MPAA' (case sensitive)

Followed by either a whitespace, -, ., ?, !, s, y or the end of the string.

Link to comment
Share on other sites

Thanks for the speedy reply.  Having some troubles with your solution:

 

Complete code:

$reptext = preg_replace('/(?:^|\\s)MPAA(?:[\\s\\-.\\?!sy]|$)/', '<span class="acro" title="Motion Picture Association of America">MPAA</span>', $reptext);

Tested on:

MPAA

Output:

span class="acro" title="Motion Picture Association of America">MPAA

 

It looks like there is some kind of replication going on here.  Any ideas?

Link to comment
Share on other sites

These are my results.

 

<?php

$replace = ' <span class="acro" title="Motion Picture Association of America">MPAA</span> ';
$regex = '/(?:^|\\s)MPAA(?:[\\s\\-.\\?!sy]|$)/';

$subject = 'MPAA';
echo preg_replace($regex, $replace, $subject);
echo "<br>\n";

$subject = 'This is some text with MPAA in it!';
echo preg_replace($regex, $replace, $subject);
echo "<br>\n";

$subject = 'MPAA stinks';
echo preg_replace($regex, $replace, $subject);
echo "<br>\n";

$subject = 'Who are the MPAA?';
echo preg_replace($regex, $replace, $subject);
echo "<br>\n";

$subject = 'MPAAy for some more testing';
echo preg_replace($regex, $replace, $subject);
echo "<br>\n";

$subject = 'This shouldn\'t convertMPAA';
echo preg_replace($regex, $replace, $subject);
echo "<br>\n";

$subject = 'MPAAnor should this!';
echo preg_replace($regex, $replace, $subject);
echo "<br>\n";

?>

 

Outputs:

 

 <span class="acro" title="Motion Picture Association of America">MPAA</span> <br>
This is some text with <span class="acro" title="Motion Picture Association of America">MPAA</span> in it!<br>
<span class="acro" title="Motion Picture Association of America">MPAA</span> stinks<br>
Who are the <span class="acro" title="Motion Picture Association of America">MPAA</span> <br>

<span class="acro" title="Motion Picture Association of America">MPAA</span>  for some more testing<br>
This shouldn't convertMPAA<br>
MPAAnor should this!<br>

 

Seems to be working. Did you not want it to act like this?

Link to comment
Share on other sites

Actually, no.  If it has a -,.,?,!,y,s or whitespace character after it, I want to add the span tags only to the MPAA part, while preserving everything else.

 

Using your test cases:

Who are the <span class="acro" title="Motion Picture Association of America">MPAA</span>?
<span class="acro" title="Motion Picture Association of America">MPAA</span>y for some more testing<br>

So the 'y' and the '?' get preserved in this case.

 

Otherwise, seems to be working.  I'll have to figure out why mine isn't working.  I tested your code, and it works, now I have to figure out why it doesn't seem to integrate with my code.

Link to comment
Share on other sites

Ah, simple enough!

 

<?php

$replace = '\1<span class="acro" title="Motion Picture Association of America">MPAA</span>\2';
$regex = '/(^|\\s)MPAA([\\s\\-.\\?!sy]|$)/';

$subject = 'Who are the MPAA?';
echo preg_replace($regex, $replace, $subject);

?>

 

Link to comment
Share on other sites

No problem, Regex can be very cryptic at first. I personally recommend Regexbuddy ( http://www.regexbuddy.com/ ) if you use regex on a regular basis or are just curious. It does a great job of explaining everything.

 

A lone set of bracket in regex creates a capturing group. This allows you to reference the values in those brackets at a later time (ie for replacement)

 

Here's the raw regex: (^|\s)MPAA([\s\-.\?!sy]|$)

 

So, I want to capture the start of the string, or a whitespace character in the first capturing group

Then I want to match MPAA (no brackets, nothing is captured

Then I want to capture one of the characters in the list or the end of the string.

 

The first capturing group is referenced using \1, the second \2, and if there happened to be a third... fourth... \3, \4 ect.

 

In most flavors of regex you can create named capturing groups as well, using (?P<foo>[A-z]) and referenced using either \1 or (P=foo). This is handy for very complex regex statements.

 

You can also create non-capturing groups (saves memory if you don't want to use the data in the brackets later) buy using (?:[A-z])

 

Hope this helps clear some curiosities. If you're interested, check out the demo for regexbuddy... I warn you though, you'll probably get addicted to it ;)

 

http://www.regexbuddy.com/cgi-bin/SetupRegexBuddyDemo.exe

 

Assuming you're running windows :)

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.