Jump to content

[SOLVED] help with smily regex


redbullmarky

Recommended Posts

Hi all
I'll be the first to admit I suck at regex stuff...
I have an array of smilies, and array of filenames. I understand that, with the use of preg_quote, I can convert my smilies into a string that's suitable for regex.

However - just need help with the pattern to search/replace under certain situations : each side of the actual smily MUST have one of the following:
1, a space
2, start/end of line
3, punctuation (comma, fullstop, bracket, colon, semicolon, questionmark, exclaimation, etc)

My problem is when I use a pattern through preg_replace, the character i'm checking for either side gets replaced. So I guess the question is - how can I preg_replace, yet leave the surrounding character (whatever it is) intact?

Cheers ;)
Mark
Link to comment
Share on other sites

Hey Mark,
You can use lookaround for stuff like this. effigy is the lookaround master around here.
[code]$pattern = array(
  '/(?<=[\s,.;:\'"?!&%$]);-\)(?=[\s,.;:\'"?!&%$])/'
);
$replacements = array(
  '<img src=\'smiley_pic.jpg\'>'
);

$test = 'That\'s great!;-) This is a lot of fun ;-)!';
$test = preg_replace($pattern, $replacements, $test);
[/code]

The only problem with this, based on your criteria, is that lookbehinds have to be fixed length, so you can't use alternation to get that start of line anchor. Maybe effigy will know a way around this, I can't think of one off the top of my head.

I have to be honest, I rarely use preg_quote, but you should be able to concat that lookbehind and the lookahead with your smiley to build the array.
Link to comment
Share on other sites

whoa

lookahead/lookbehinds are a bit new to me. what can it/they do for me? the preg_ stuff in your example kinda clouds the smily bit as lots of : and ) etc - but lets say i have something like:

[code]
<?php
$text = ":) hello this is some text (this is some more in brackets with smily at the end ;)) ok :)!";

$search = array(':)', ';)');
$replace = array('smile', 'wink');

// all the preg_quote / preg_replace stuff here
?>
[/code]

so far, i have a loop which escapes all the smily array and adds the start and ending /, then just a simple:

[code]
$text = preg_replace($search, $replace, $text);
[/code]
to do the business. so i guess i'm looking to make sure that brackets, spaces, start/end, etc, are all treated the same. If your example does that, any chance you can split it into its parts so i can see what's what? :)

cheers for your help
Mark
Link to comment
Share on other sites

[quote author=redbullmarky link=topic=124422.msg515503#msg515503 date=1170007930]
If your example does that, any chance you can split it into its parts so i can see what's what? :)
[/quote]
Sure,
[code]$pattern = array(
  '/(?<=[\s,.;:\'"?!&%$]);-\)(?=[\s,.;:\'"?!&%$])/'
);[/code]Here the basic match is:
[code]/;-\)/[/code]
which will match this smiley ' ; - ) ' (I added spaces so it wont get parsed in this post), where '/' are the delimeters.
This:[code](?<=...)[/code]
Is a "positive assertion lookbehind", which works kind of like an if statement. In English that essentially means match '; - )' only if it is preceded by one of the characters in the character class I've got in there, [\s,.;:\'"?!&%$], in this case. (I escaped the ' so PHP wont get confused)
This part:[code](?=...)[/code]
Is essentially the same thing, except a "positive assertion lookahead". This:
[code](?=[\s,.;:\'"?!&%$])[/code]Means match '; - )' only if one of the characters in this class immediately follows it.

The really cool thing about lookahead and lookbehind (collectively lookaround) is that it doesn't "consume" any characters in the match. It essentially just checks to see if it is there and allows the match to succeed if it is, but with out making the lookaround part of the final match.

I'll have to do some digging on that start of string thing. In order to not get stuck in huge loops, lookbehinds must be a fixed length, which stinks because I'd normally just use alternation to match the beginning of the line:
[code]preg_match('/(?:^|[\s,.;:\'"?!&%$])smiley/m', $foo);[/code]
'(?:...)' are non-capturing parenthesis.

I'd probably just use two passes. This one:
[code]$pattern = array(
  '/(?<=[\s,.;:\'"?!&%$]);-\)(?=[\s,.;:\'"?!&%$\n])/' // I added the newline character here
);[/code]
And something like:
[code]$pattern = array(
  '/^;-\)/m'
);[/code]

The first one should get all the smilies except the ones at the true start of the line. The second one will just match smilies at the beginning of the line.

Hope that helps!
Link to comment
Share on other sites

[quote author=c4onastick link=topic=124422.msg515498#msg515498 date=1170007283]
...lookbehinds have to be fixed length, so you can't use alternation to get that start of line anchor.
[/quote]

Ah, but you can. Technically, alternation is fixed length because you're saying either/or--of course, the "either" and "or" parts have to be fixed length.

The code below should get you by unless you're working with Unicode and locales.

[code]
<pre>
<?php
$tests = array(
':) :D :X',
'Text:) :(Text :O',
'Text:(Text',
' :) ',
':)',
);

$find = array(':)', ':(', ':D', ':O', ':X');
$replace = array('--smile--', '--frown--', '--grin--', '--surprise--', '--silence--');

foreach ($tests as $test) {
echo $test, ' => ';
$i = 0;
foreach ($find as $smiley) {
$test = preg_replace('/(?<=^|[\s\W])' . preg_quote($smiley). '(?=\z|[\s\W])/', $replace[$i], $test);
++$i;
}
echo $test, '<br>';
}
?>
</pre>
[/code]
Link to comment
Share on other sites

[quote author=c4onastick link=topic=124422.msg515918#msg515918 date=1170053139]
AH! I figured out why it didn't work for me. I used:
[code](?<=(?:^|...))[/code]
Which doesn't work by the way... I think because it hides the alternation from the compiler in the lookaround.
[code](?<=^|...)[/code]
Does work.
[/quote]

Interestingly enough, I tried the same pattern in Perl and it was interpreted as a variable length lookbehind. I was able to get around this by "inverting" the pattern to[tt] (?:^|(?<=[\s\W]))[/tt]. This also works in PHP.

On a side note, it may be better to check for[tt] \s\W [/tt]before[tt] ^[/tt], because a beginning of line anchor can only occur once--unless you're in multi-line mode--, therefore giving you less chances of failure in your alternation checks.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.