Modify every nth word in a string uniformly

megatr0n · May 19, 2015

I am trying build a function that can uniformly change words.e. The black cat is sitting on the mat. I want every two words(or nth words) to end up like The MOD cat MOD sitting MOD the MOD. The words are changed uniformly. Currently I have:

$alltext = 'The black cat is sitting on the mat';

//using regex break text into words into an array
$pattern = '/([a-zA-Z]|\xC3[\x80-\x96\x98-\xB6\xB8-\xBF]|\xC5[\x92\x93\xA0\xA1\xB8\xBD\xBE]){1,}/';
$n_words = preg_match_all($pattern, $input_str, $match_arr, PREG_OFFSET_CAPTURE);
$wordcnt = 0;

   foreach ($match_arr[0] as $val)
    {
   $wordcnt++;
   $aword = $val[0];

$wordpos = $val[1];

$alltext = str_ireplace($wordpos, $aword, 'MOD'.changer($aword),$alltext);

}

function changer($in)

{

$in = $in.'IFY';

return $in;

}

This does not replace nth words uniformly. How can do this?

requinix · May 19, 2015

Use preg_split() to split the sentence into words, run a loop that replaces every other word, then implode() it all back together.

Translating the $pattern you have now into one that matches non-word characters will be easier if you can tell me what Unicode ranges you're trying to match. Which, by the way, is done much more easily with the actual Unicode support that PCRE has instead of constructing the bytes yourself.

Or preg_replace() every pair of words with the first word + the replacement. Again, the regex will be a lot nicer if you use PCRE's Unicode support.

megatr0n · May 19, 2015

The regex that I have is not the best. It took me awhile to come up with that and it does find the matches I want. Using the preg_split() seems like the way to go.

requinix · May 20, 2015

Alright, based on those byte ranges it looks like you're aiming for just letters. You can do them all with Unicode, but I suggest you include apostrophes in there too (for contractions):

/[\pL']+/u

The preg_split method is a bit tricky to wrap your head around, but the code is simple:

$words = preg_split('/([^\pL\']+)/u', $alltext, -1, PREG_SPLIT_DELIM_CAPTURE);
$replace = "MOD";
$n = 2; // every second word
$i = ($n - 1) * 2;

// $words includes the words *and the spaces*, alternating, because you'll need the spaces when you implode() it back together
// [0] is a word, [1] is a space, [2] is a word, and so on
// [0], [2], [4], ... is every word ($n=1, 0+2i),
// [2], [6], [10], ... is every second word ($n=2, 2+4i),
// [4], [10], [16], ... is every third word ($n=3, 4+6i),
// or another way, [($n-1)*2 + ($n*2)i]

$wordcount = count($words);
for ($i = ($n - 1) * 2; $i < $wordcount; $i += $n * 2) {
	$words[$i] = $replace;
}

$words = implode('', $words);

The preg_replace() version is just as simple but the regex is a bit longer: count out $n-1 words and spaces, capture that, capture the last word, then replace the lot with the capture and your replacement word. The advantage is that you have preg_replace() doing all the work for you.

$replace = "MOD";
$n = 2; // every second word

$words = preg_replace('/(([\pL\']+[^\pL\']+){' . ($n - 1) . '})[\pL\']+/u', '$1' . preg_quote($replace), $alltext);

maxxd · May 20, 2015

I'm certainly not the best at regular expressions, but wouldn't it be easier to explode the sentence on ' ' and loop through the resulting array?

$words = explode(' ', $sentence);
for($i=0; $i<count($words); $i++){
	if($i%2 == 0){
		$words[$i] = 'MOD';
	}
}
$sentence = implode(' ', $words);
print("<p>{$sentence}</p>");

Or am I overlooking something obvious?

requinix · May 20, 2015

I'm certainly not the best at regular expressions, but wouldn't it be easier to explode the sentence on ' ' and loop through the resulting array?

Space alone isn't enough if you want to be really pedantic. There's other symbols to consider, like periods at the end of sentences, that would be lost if you didn't be sure to insert them back in. If there are two spaces then explode() will return an empty string between them.

"Explode"ing on non-word characters is the next step, but that's too sophisticated for explode() to handle. You'd need regular expressions. And then you'd need to capture what you "exploded" on so you'd be sure to keep track of it. And now you've arrived at the preg_split() option I gave

maxxd · May 20, 2015

I'll have to do some digging into regex - as I should have said, I'm terrible at them. I can't even read your string . Good point about double spaces, but wouldn't using array_filter() remove any empty elements? And exploding on a space would keep the periods at the end of words because it's before the space (or double space). Also, what non-word characters would have to be exploded upon?

Interesting discussion about a topic I know woefully little about (let me know if this is now veering way off topic and I should open this in the miscellaneous section), so thanks much for expounding and explaining!

grissom · May 20, 2015

why not just use explode to separate the sentence into an array then just run through the array changing every odd (since the array starts at zero) word.

Bit of pseudo code :

$words = explode(" ", $sentence);

for ($n = 0; $n<=length($words)) {

if ($n is even) echo $words[$n] else echo 'MOD';

}

Barand · May 20, 2015

why not just use explode to separate the sentence into an array then just run through the array changing every odd (since the array starts at zero) word.

See reply #6 ^ by requinix

requinix · May 20, 2015

I can't even read your string .

I can explain them. First one is pretty simple:

- () capture

- [^]+ one or more characters that are not

- \pL Unicode characters that are classified as "letters"

- \' or apostrophes

- /u flag to enable UTF-8/Unicode mode

preg_split() would normally work like explode(), but with the PREG_SPLIT_DELIM_CAPTURE flag it also returns anything captured. Thus the explanation in the comments.

Second is longer but really not that much more complicated:

- [\pL\']+ A word consisting of Unicode letters or apostrophes

- [^\pL\']+ Things that aren't letters or apostrophes (like spaces or periods)

- {$n-1} Repeat those $n-1 times

- [\pL\']+ The last word

$1 will be all but the last word.

Good point about double spaces, but wouldn't using array_filter() remove any empty elements?

By default it would also remove a string "0" because that ==false. You'd have to use a callback function to actually do $word == "".

And exploding on a space would keep the periods at the end of words because it's before the space (or double space).

Right, but what if one of those words were being replaced? "A very short sentence." would become "A MOD short MOD" - no more period. You'd have to detect that period (or comma, or exclamation point, or...) and add it back in.

Also, what non-word characters would have to be exploded upon?

Anything but letters and apostrophes. They're non-word characters which means they also act as word separators. Arguably hyphens could be included in there too, like how "non-word" is either one word or two depending how you look at it, except hyphens are used for a lot more than that so you'd need more sophisticated logic like "hyphens are considered word characters if they have a letter on both sides, otherwise not" which would suck.

megatr0n · May 20, 2015

Alright, based on those byte ranges it looks like you're aiming for just letters. You can do them all with Unicode, but I suggest you include apostrophes in there too (for contractions):
/[\pL']+/u
The preg_split method is a bit tricky to wrap your head around, but the code is simple:
$words = preg_split('/([^\pL\']+)/u', $alltext, -1, PREG_SPLIT_DELIM_CAPTURE);
$replace = "MOD";
$n = 2; // every second word
$i = ($n - 1) * 2;

// $words includes the words *and the spaces*, alternating, because you'll need the spaces when you implode() it back together
// [0] is a word, [1] is a space, [2] is a word, and so on
// [0], [2], [4], ... is every word ($n=1, 0+2i),
// [2], [6], [10], ... is every second word ($n=2, 2+4i),
// [4], [10], [16], ... is every third word ($n=3, 4+6i),
// or another way, [($n-1)*2 + ($n*2)i]

$wordcount = count($words);
for ($i = ($n - 1) * 2; $i < $wordcount; $i += $n * 2) {
	$words[$i] = $replace;
}

$words = implode('', $words);
The preg_replace() version is just as simple but the regex is a bit longer: count out $n-1 words and spaces, capture that, capture the last word, then replace the lot with the capture and your replacement word. The advantage is that you have preg_replace() doing all the work for you.
$replace = "MOD";
$n = 2; // every second word

$words = preg_replace('/(([\pL\']+[^\pL\']+){' . ($n - 1) . '})[\pL\']+/u', '$1' . preg_quote($replace), $alltext);

Thanks. It has a few hickups, but I can do the rest; it selects null characters.

maxxd · May 21, 2015

@requinix - good points all, and thank you for the in-depth explanation!

requinix · May 21, 2015

Thanks. It has a few hickups, but I can do the rest; it selects null characters.

It does... what, exactly?

Sign In

Modify every nth word in a string uniformly

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information