[SOLVED] help with smily regex

redbullmarky · January 28, 2007

Hi all
I'll be the first to admit I suck at regex stuff...
I have an array of smilies, and array of filenames. I understand that, with the use of preg_quote, I can convert my smilies into a string that's suitable for regex.

However - just need help with the pattern to search/replace under certain situations : each side of the actual smily MUST have one of the following:
1, a space
2, start/end of line
3, punctuation (comma, fullstop, bracket, colon, semicolon, questionmark, exclaimation, etc)

My problem is when I use a pattern through preg_replace, the character i'm checking for either side gets replaced. So I guess the question is - how can I preg_replace, yet leave the surrounding character (whatever it is) intact?

Cheers ;)
Mark

c4onastick · January 28, 2007

Hey Mark,
You can use lookaround for stuff like this. effigy is the lookaround master around here.
[code]$pattern = array(
'/(?<=[\s,.;:\'"?!&%$]);-\)(?=[\s,.;:\'"?!&%$])/'
);
$replacements = array(
'<img src=\'smiley_pic.jpg\'>'
);

$test = 'That\'s great!;-) This is a lot of fun ;-)!';
$test = preg_replace($pattern, $replacements, $test);
[/code]

The only problem with this, based on your criteria, is that lookbehinds have to be fixed length, so you can't use alternation to get that start of line anchor. Maybe effigy will know a way around this, I can't think of one off the top of my head.

I have to be honest, I rarely use preg_quote, but you should be able to concat that lookbehind and the lookahead with your smiley to build the array.

redbullmarky · January 28, 2007

whoa

lookahead/lookbehinds are a bit new to me. what can it/they do for me? the preg_ stuff in your example kinda clouds the smily bit as lots of : and ) etc - but lets say i have something like:

[code]
<?php
$text = ":) hello this is some text (this is some more in brackets with smily at the end ;)) ok :)!";

$search = array(':)', ';)');
$replace = array('smile', 'wink');

// all the preg_quote / preg_replace stuff here
?>
[/code]

so far, i have a loop which escapes all the smily array and adds the start and ending /, then just a simple:

[code]
$text = preg_replace($search, $replace, $text);
[/code]
to do the business. so i guess i'm looking to make sure that brackets, spaces, start/end, etc, are all treated the same. If your example does that, any chance you can split it into its parts so i can see what's what? :)

cheers for your help
Mark

c4onastick · January 28, 2007

[quote author=redbullmarky link=topic=124422.msg515503#msg515503 date=1170007930]
If your example does that, any chance you can split it into its parts so i can see what's what? :)
[/quote]
Sure,
[code]$pattern = array(
'/(?<=[\s,.;:\'"?!&%$]);-\)(?=[\s,.;:\'"?!&%$])/'
);[/code]Here the basic match is:
[code]/;-\)/[/code]
which will match this smiley ' ; - ) ' (I added spaces so it wont get parsed in this post), where '/' are the delimeters.
This:[code](?<=...)[/code]
Is a "positive assertion lookbehind", which works kind of like an if statement. In English that essentially means match '; - )' only if it is preceded by one of the characters in the character class I've got in there, [\s,.;:\'"?!&%$], in this case. (I escaped the ' so PHP wont get confused)
This part:[code](?=...)[/code]
Is essentially the same thing, except a "positive assertion lookahead". This:
[code](?=[\s,.;:\'"?!&%$])[/code]Means match '; - )' only if one of the characters in this class immediately follows it.

The really cool thing about lookahead and lookbehind (collectively lookaround) is that it doesn't "consume" any characters in the match. It essentially just checks to see if it is there and allows the match to succeed if it is, but with out making the lookaround part of the final match.

I'll have to do some digging on that start of string thing. In order to not get stuck in huge loops, lookbehinds must be a fixed length, which stinks because I'd normally just use alternation to match the beginning of the line:
[code]preg_match('/(?:^|[\s,.;:\'"?!&%$])smiley/m', $foo);[/code]
'(?:...)' are non-capturing parenthesis.

I'd probably just use two passes. This one:
[code]$pattern = array(
'/(?<=[\s,.;:\'"?!&%$]);-\)(?=[\s,.;:\'"?!&%$\n])/' // I added the newline character here
);[/code]
And something like:
[code]$pattern = array(
'/^;-\)/m'
);[/code]

The first one should get all the smilies except the ones at the true start of the line. The second one will just match smilies at the beginning of the line.

Hope that helps!

effigy · January 29, 2007

[quote author=c4onastick link=topic=124422.msg515498#msg515498 date=1170007283]
...lookbehinds have to be fixed length, so you can't use alternation to get that start of line anchor.
[/quote]

Ah, but you can. Technically, alternation is fixed length because you're saying either/or--of course, the "either" and "or" parts have to be fixed length.

The code below should get you by unless you're working with Unicode and locales.

[code]
<pre>
<?php
$tests = array(
':) :D :X',
'Text:) :(Text :O',
'Text:(Text',
' :) ',
':)',
);

$find = array(':)', ':(', ':D', ':O', ':X');
$replace = array('--smile--', '--frown--', '--grin--', '--surprise--', '--silence--');

foreach ($tests as $test) {
echo $test, ' => ';
$i = 0;
foreach ($find as $smiley) {
$test = preg_replace('/(?<=^|[\s\W])' . preg_quote($smiley). '(?=\z|[\s\W])/', $replace[$i], $test);
++$i;
}
echo $test, '<br>';
}
?>
</pre>
[/code]

c4onastick · January 29, 2007

Oh me of little faith! Thanks effigy.

c4onastick · January 29, 2007

AH! I figured out why it didn't work for me. I used:
[code](?<=(?:^|...))[/code]
Which doesn't work by the way... I think because it hides the alternation from the compiler in the lookaround.
[code](?<=^|...)[/code]
Does work.

redbullmarky · January 29, 2007

fantastic replies from the both of you - cheers lads ;D
After some hacking away I ended up with a bit of a mix and match of both but working perfectly.

Thanks again!

Mark

effigy · January 29, 2007

[quote author=c4onastick link=topic=124422.msg515918#msg515918 date=1170053139]
AH! I figured out why it didn't work for me. I used:
[code](?<=(?:^|...))[/code]
Which doesn't work by the way... I think because it hides the alternation from the compiler in the lookaround.
[code](?<=^|...)[/code]
Does work.
[/quote]

Interestingly enough, I tried the same pattern in Perl and it was interpreted as a variable length lookbehind. I was able to get around this by "inverting" the pattern to[tt] (?:^|(?<=[\s\W]))[/tt]. This also works in PHP.

On a side note, it may be better to check for[tt] \s\W [/tt]before[tt] ^[/tt], because a beginning of line anchor can only occur once--unless you're in multi-line mode--, therefore giving you less chances of failure in your alternation checks.

Sign In

[SOLVED] help with smily regex

Recommended Posts

redbullmarky

Link to comment

Share on other sites

c4onastick

Link to comment

Share on other sites

redbullmarky

Link to comment

Share on other sites

c4onastick

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

c4onastick

Link to comment

Share on other sites

c4onastick

Link to comment

Share on other sites

redbullmarky

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information