Jump to content

Modify every nth word in a string uniformly


Go to solution Solved by requinix,

Recommended Posts

I am trying build a function that can uniformly change words.e. The black cat is sitting on the mat. I want every two words(or nth words) to end up like The MOD cat MOD sitting MOD the MOD. The words are changed uniformly. Currently I have:

 

 

$alltext = 'The black cat is sitting on the mat';

//using regex break text into words into an array
$pattern = '/([a-zA-Z]|\xC3[\x80-\x96\x98-\xB6\xB8-\xBF]|\xC5[\x92\x93\xA0\xA1\xB8\xBD\xBE]){1,}/';
$n_words = preg_match_all($pattern, $input_str, $match_arr, PREG_OFFSET_CAPTURE);
$wordcnt = 0;

   foreach ($match_arr[0] as $val)
    {
    $wordcnt++;    
    $aword = $val[0];

    $wordpos = $val[1];

     $alltext = str_ireplace($wordpos, $aword, 'MOD'.changer($aword),$alltext);

    }

 

function changer($in)

{

$in = $in.'IFY';

return $in;

}

 

 

This does not replace nth words uniformly. How can do this?

Use preg_split() to split the sentence into words, run a loop that replaces every other word, then implode() it all back together.

Translating the $pattern you have now into one that matches non-word characters will be easier if you can tell me what Unicode ranges you're trying to match. Which, by the way, is done much more easily with the actual Unicode support that PCRE has instead of constructing the bytes yourself.

 

Or preg_replace() every pair of words with the first word + the replacement. Again, the regex will be a lot nicer if you use PCRE's Unicode support.

  • Solution

Alright, based on those byte ranges it looks like you're aiming for just letters. You can do them all with Unicode, but I suggest you include apostrophes in there too (for contractions):

/[\pL']+/u
The preg_split method is a bit tricky to wrap your head around, but the code is simple:

$words = preg_split('/([^\pL\']+)/u', $alltext, -1, PREG_SPLIT_DELIM_CAPTURE);
$replace = "MOD";
$n = 2; // every second word
$i = ($n - 1) * 2;

// $words includes the words *and the spaces*, alternating, because you'll need the spaces when you implode() it back together
// [0] is a word, [1] is a space, [2] is a word, and so on
// [0], [2], [4], ... is every word ($n=1, 0+2i),
// [2], [6], [10], ... is every second word ($n=2, 2+4i),
// [4], [10], [16], ... is every third word ($n=3, 4+6i),
// or another way, [($n-1)*2 + ($n*2)i]

$wordcount = count($words);
for ($i = ($n - 1) * 2; $i < $wordcount; $i += $n * 2) {
	$words[$i] = $replace;
}

$words = implode('', $words);
The preg_replace() version is just as simple but the regex is a bit longer: count out $n-1 words and spaces, capture that, capture the last word, then replace the lot with the capture and your replacement word. The advantage is that you have preg_replace() doing all the work for you.

$replace = "MOD";
$n = 2; // every second word

$words = preg_replace('/(([\pL\']+[^\pL\']+){' . ($n - 1) . '})[\pL\']+/u', '$1' . preg_quote($replace), $alltext);

I'm certainly not the best at regular expressions, but wouldn't it be easier to explode the sentence on ' ' and loop through the resulting array?

$words = explode(' ', $sentence);
for($i=0; $i<count($words); $i++){
	if($i%2 == 0){
		$words[$i] = 'MOD';
	}
}
$sentence = implode(' ', $words);
print("<p>{$sentence}</p>");

Or am I overlooking something obvious?

I'm certainly not the best at regular expressions, but wouldn't it be easier to explode the sentence on ' ' and loop through the resulting array?

Space alone isn't enough if you want to be really pedantic. There's other symbols to consider, like periods at the end of sentences, that would be lost if you didn't be sure to insert them back in. If there are two spaces then explode() will return an empty string between them.

 

"Explode"ing on non-word characters is the next step, but that's too sophisticated for explode() to handle. You'd need regular expressions. And then you'd need to capture what you "exploded" on so you'd be sure to keep track of it. And now you've arrived at the preg_split() option I gave :D

I'll have to do some digging into regex - as I should have said, I'm terrible at them. I can't even read your string :). Good point about double spaces, but wouldn't using array_filter() remove any empty elements? And exploding on a space would keep the periods at the end of words because it's before the space (or double space). Also, what non-word characters would have to be exploded upon?

 

Interesting discussion about a topic I know woefully little about (let me know if this is now veering way off topic and I should open this in the miscellaneous section), so thanks much for expounding and explaining!

why not just use explode to separate the sentence into an array then just run through the array changing every odd (since the array starts at zero) word.

 

Bit of pseudo code :

 

$words = explode(" ", $sentence);

for ($n = 0; $n<=length($words)) {

   if ($n is even) echo $words[$n] else echo 'MOD';

   }

why not just use explode to separate the sentence into an array then just run through the array changing every odd (since the array starts at zero) word.

 

See reply #6 ^ by requinix

Edited by Barand

I can't even read your string :).

I can explain them. First one is pretty simple:

- () capture

- [^]+ one or more characters that are not

- \pL Unicode characters that are classified as "letters"

- \' or apostrophes

- /u flag to enable UTF-8/Unicode mode

preg_split() would normally work like explode(), but with the PREG_SPLIT_DELIM_CAPTURE flag it also returns anything captured. Thus the explanation in the comments.

 

Second is longer but really not that much more complicated:

- [\pL\']+ A word consisting of Unicode letters or apostrophes

- [^\pL\']+ Things that aren't letters or apostrophes (like spaces or periods)

- {$n-1} Repeat those $n-1 times

- [\pL\']+ The last word

$1 will be all but the last word.

 

Good point about double spaces, but wouldn't using array_filter() remove any empty elements?

By default it would also remove a string "0" because that ==false. You'd have to use a callback function to actually do $word == "".

 

And exploding on a space would keep the periods at the end of words because it's before the space (or double space).

Right, but what if one of those words were being replaced? "A very short sentence." would become "A MOD short MOD" - no more period. You'd have to detect that period (or comma, or exclamation point, or...) and add it back in.

 

Also, what non-word characters would have to be exploded upon?

Anything but letters and apostrophes. They're non-word characters which means they also act as word separators. Arguably hyphens could be included in there too, like how "non-word" is either one word or two depending how you look at it, except hyphens are used for a lot more than that so you'd need more sophisticated logic like "hyphens are considered word characters if they have a letter on both sides, otherwise not" which would suck. Edited by requinix
  • Like 1

Alright, based on those byte ranges it looks like you're aiming for just letters. You can do them all with Unicode, but I suggest you include apostrophes in there too (for contractions):

/[\pL']+/u
The preg_split method is a bit tricky to wrap your head around, but the code is simple:

$words = preg_split('/([^\pL\']+)/u', $alltext, -1, PREG_SPLIT_DELIM_CAPTURE);
$replace = "MOD";
$n = 2; // every second word
$i = ($n - 1) * 2;

// $words includes the words *and the spaces*, alternating, because you'll need the spaces when you implode() it back together
// [0] is a word, [1] is a space, [2] is a word, and so on
// [0], [2], [4], ... is every word ($n=1, 0+2i),
// [2], [6], [10], ... is every second word ($n=2, 2+4i),
// [4], [10], [16], ... is every third word ($n=3, 4+6i),
// or another way, [($n-1)*2 + ($n*2)i]

$wordcount = count($words);
for ($i = ($n - 1) * 2; $i < $wordcount; $i += $n * 2) {
	$words[$i] = $replace;
}

$words = implode('', $words);
The preg_replace() version is just as simple but the regex is a bit longer: count out $n-1 words and spaces, capture that, capture the last word, then replace the lot with the capture and your replacement word. The advantage is that you have preg_replace() doing all the work for you.

$replace = "MOD";
$n = 2; // every second word

$words = preg_replace('/(([\pL\']+[^\pL\']+){' . ($n - 1) . '})[\pL\']+/u', '$1' . preg_quote($replace), $alltext);

Thanks. It has a few hickups, but I can do the rest; it selects null characters.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.