Jump to content

[SOLVED] help with regexp on a multibyte string


Recommended Posts

I have the following string:

PHP Code:

$text="א אב אבי אביהו מדינה שול של";

In which I wish to add 'אאא' to all <4 chars word, so the string will turn into:

"אאאא אבאאא אביאאא אביהו מדינה שולאאא שלאאא"

 

The code I am using is:

PHP Code:

   $text="א אב אבי אביהו מדינה שול של";
    $pattern='/\s(.{1,6})\s/';
    $text=preg_replace($pattern,' $1אאא ',$text);
    echo $text;

Which results in:

א אבאאא אבי אביהו מדינה שולאאא של

 

 

Problems:

1. It seems word boundary is not recognized (hence my use of \s).

2. Why was the אבי not replaced?

Finally got it working after a lot of tweaking:

 

<?php

header('Content-type: text/plain; charset=utf-8');

$text = 'א אב אבי אביהו מדינה שול של';

$add = 'אאא';

$text = preg_replace('~\S+~ue', "(mb_strlen('$0', 'utf-8') < 4) ? '$0$add' : '$0'", $text);

?>

 

Using a curly bracket quantifier inside the pattern didn't work properly, so I'm grabbing each word (\S+: Any string of chars not containing a whitespace character) and then checking the length of the word with mb_strlen() inside the replacement. It's important to note that the u pattern modifier treats the pattern as Unicode, and that the e modifier treats the replacement as PHP.

 

Edit: Unicode chars didn't display properly. Fixed by removing

 tags.
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.