itaym02 Posted September 3, 2009 Share Posted September 3, 2009 I have the following string: PHP Code: $text="א אב אבי אביהו מדינה שול של"; In which I wish to add 'אאא' to all <4 chars word, so the string will turn into: "אאאא אבאאא אביאאא אביהו מדינה שולאאא שלאאא" The code I am using is: PHP Code: $text="א אב אבי אביהו מדינה שול של"; $pattern='/\s(.{1,6})\s/'; $text=preg_replace($pattern,' $1אאא ',$text); echo $text; Which results in: א אבאאא אבי אביהו מדינה שולאאא של Problems: 1. It seems word boundary is not recognized (hence my use of \s). 2. Why was the אבי not replaced? Link to comment https://forums.phpfreaks.com/topic/172978-solved-help-with-regexp-on-a-multibyte-string/ Share on other sites More sharing options...
thebadbad Posted September 4, 2009 Share Posted September 4, 2009 Finally got it working after a lot of tweaking: <?php header('Content-type: text/plain; charset=utf-8'); $text = 'א אב אבי אביהו מדינה שול של'; $add = 'אאא'; $text = preg_replace('~\S+~ue', "(mb_strlen('$0', 'utf-8') < 4) ? '$0$add' : '$0'", $text); ?> Using a curly bracket quantifier inside the pattern didn't work properly, so I'm grabbing each word (\S+: Any string of chars not containing a whitespace character) and then checking the length of the word with mb_strlen() inside the replacement. It's important to note that the u pattern modifier treats the pattern as Unicode, and that the e modifier treats the replacement as PHP. Edit: Unicode chars didn't display properly. Fixed by removing tags. Link to comment https://forums.phpfreaks.com/topic/172978-solved-help-with-regexp-on-a-multibyte-string/#findComment-912272 Share on other sites More sharing options...
itaym02 Posted September 4, 2009 Author Share Posted September 4, 2009 Thanks Link to comment https://forums.phpfreaks.com/topic/172978-solved-help-with-regexp-on-a-multibyte-string/#findComment-912352 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.