megatr0n Posted May 19, 2015 Share Posted May 19, 2015 I am trying build a function that can uniformly change words.e. The black cat is sitting on the mat. I want every two words(or nth words) to end up like The MOD cat MOD sitting MOD the MOD. The words are changed uniformly. Currently I have: $alltext = 'The black cat is sitting on the mat'; //using regex break text into words into an array$pattern = '/([a-zA-Z]|\xC3[\x80-\x96\x98-\xB6\xB8-\xBF]|\xC5[\x92\x93\xA0\xA1\xB8\xBD\xBE]){1,}/';$n_words = preg_match_all($pattern, $input_str, $match_arr, PREG_OFFSET_CAPTURE);$wordcnt = 0; foreach ($match_arr[0] as $val) { $wordcnt++; $aword = $val[0]; $wordpos = $val[1]; $alltext = str_ireplace($wordpos, $aword, 'MOD'.changer($aword),$alltext); } function changer($in) { $in = $in.'IFY'; return $in; } This does not replace nth words uniformly. How can do this? Quote Link to comment Share on other sites More sharing options...
requinix Posted May 19, 2015 Share Posted May 19, 2015 Use preg_split() to split the sentence into words, run a loop that replaces every other word, then implode() it all back together. Translating the $pattern you have now into one that matches non-word characters will be easier if you can tell me what Unicode ranges you're trying to match. Which, by the way, is done much more easily with the actual Unicode support that PCRE has instead of constructing the bytes yourself. Or preg_replace() every pair of words with the first word + the replacement. Again, the regex will be a lot nicer if you use PCRE's Unicode support. Quote Link to comment Share on other sites More sharing options...
megatr0n Posted May 19, 2015 Author Share Posted May 19, 2015 The regex that I have is not the best. It took me awhile to come up with that and it does find the matches I want. Using the preg_split() seems like the way to go. Quote Link to comment Share on other sites More sharing options...
Solution requinix Posted May 20, 2015 Solution Share Posted May 20, 2015 Alright, based on those byte ranges it looks like you're aiming for just letters. You can do them all with Unicode, but I suggest you include apostrophes in there too (for contractions): /[\pL']+/uThe preg_split method is a bit tricky to wrap your head around, but the code is simple: $words = preg_split('/([^\pL\']+)/u', $alltext, -1, PREG_SPLIT_DELIM_CAPTURE); $replace = "MOD"; $n = 2; // every second word $i = ($n - 1) * 2; // $words includes the words *and the spaces*, alternating, because you'll need the spaces when you implode() it back together // [0] is a word, [1] is a space, [2] is a word, and so on // [0], [2], [4], ... is every word ($n=1, 0+2i), // [2], [6], [10], ... is every second word ($n=2, 2+4i), // [4], [10], [16], ... is every third word ($n=3, 4+6i), // or another way, [($n-1)*2 + ($n*2)i] $wordcount = count($words); for ($i = ($n - 1) * 2; $i < $wordcount; $i += $n * 2) { $words[$i] = $replace; } $words = implode('', $words);The preg_replace() version is just as simple but the regex is a bit longer: count out $n-1 words and spaces, capture that, capture the last word, then replace the lot with the capture and your replacement word. The advantage is that you have preg_replace() doing all the work for you. $replace = "MOD"; $n = 2; // every second word $words = preg_replace('/(([\pL\']+[^\pL\']+){' . ($n - 1) . '})[\pL\']+/u', '$1' . preg_quote($replace), $alltext); Quote Link to comment Share on other sites More sharing options...
maxxd Posted May 20, 2015 Share Posted May 20, 2015 I'm certainly not the best at regular expressions, but wouldn't it be easier to explode the sentence on ' ' and loop through the resulting array? $words = explode(' ', $sentence); for($i=0; $i<count($words); $i++){ if($i%2 == 0){ $words[$i] = 'MOD'; } } $sentence = implode(' ', $words); print("<p>{$sentence}</p>"); Or am I overlooking something obvious? Quote Link to comment Share on other sites More sharing options...
requinix Posted May 20, 2015 Share Posted May 20, 2015 I'm certainly not the best at regular expressions, but wouldn't it be easier to explode the sentence on ' ' and loop through the resulting array?Space alone isn't enough if you want to be really pedantic. There's other symbols to consider, like periods at the end of sentences, that would be lost if you didn't be sure to insert them back in. If there are two spaces then explode() will return an empty string between them. "Explode"ing on non-word characters is the next step, but that's too sophisticated for explode() to handle. You'd need regular expressions. And then you'd need to capture what you "exploded" on so you'd be sure to keep track of it. And now you've arrived at the preg_split() option I gave Quote Link to comment Share on other sites More sharing options...
maxxd Posted May 20, 2015 Share Posted May 20, 2015 I'll have to do some digging into regex - as I should have said, I'm terrible at them. I can't even read your string . Good point about double spaces, but wouldn't using array_filter() remove any empty elements? And exploding on a space would keep the periods at the end of words because it's before the space (or double space). Also, what non-word characters would have to be exploded upon? Interesting discussion about a topic I know woefully little about (let me know if this is now veering way off topic and I should open this in the miscellaneous section), so thanks much for expounding and explaining! Quote Link to comment Share on other sites More sharing options...
grissom Posted May 20, 2015 Share Posted May 20, 2015 why not just use explode to separate the sentence into an array then just run through the array changing every odd (since the array starts at zero) word. Bit of pseudo code : $words = explode(" ", $sentence); for ($n = 0; $n<=length($words)) { if ($n is even) echo $words[$n] else echo 'MOD'; } Quote Link to comment Share on other sites More sharing options...
Barand Posted May 20, 2015 Share Posted May 20, 2015 (edited) why not just use explode to separate the sentence into an array then just run through the array changing every odd (since the array starts at zero) word. See reply #6 ^ by requinix Edited May 20, 2015 by Barand Quote Link to comment Share on other sites More sharing options...
requinix Posted May 20, 2015 Share Posted May 20, 2015 (edited) I can't even read your string .I can explain them. First one is pretty simple: - () capture - [^]+ one or more characters that are not - \pL Unicode characters that are classified as "letters" - \' or apostrophes - /u flag to enable UTF-8/Unicode mode preg_split() would normally work like explode(), but with the PREG_SPLIT_DELIM_CAPTURE flag it also returns anything captured. Thus the explanation in the comments. Second is longer but really not that much more complicated: - [\pL\']+ A word consisting of Unicode letters or apostrophes - [^\pL\']+ Things that aren't letters or apostrophes (like spaces or periods) - {$n-1} Repeat those $n-1 times - [\pL\']+ The last word $1 will be all but the last word. Good point about double spaces, but wouldn't using array_filter() remove any empty elements?By default it would also remove a string "0" because that ==false. You'd have to use a callback function to actually do $word == "". And exploding on a space would keep the periods at the end of words because it's before the space (or double space).Right, but what if one of those words were being replaced? "A very short sentence." would become "A MOD short MOD" - no more period. You'd have to detect that period (or comma, or exclamation point, or...) and add it back in. Also, what non-word characters would have to be exploded upon?Anything but letters and apostrophes. They're non-word characters which means they also act as word separators. Arguably hyphens could be included in there too, like how "non-word" is either one word or two depending how you look at it, except hyphens are used for a lot more than that so you'd need more sophisticated logic like "hyphens are considered word characters if they have a letter on both sides, otherwise not" which would suck. Edited May 20, 2015 by requinix 1 Quote Link to comment Share on other sites More sharing options...
megatr0n Posted May 20, 2015 Author Share Posted May 20, 2015 Alright, based on those byte ranges it looks like you're aiming for just letters. You can do them all with Unicode, but I suggest you include apostrophes in there too (for contractions): /[\pL']+/uThe preg_split method is a bit tricky to wrap your head around, but the code is simple: $words = preg_split('/([^\pL\']+)/u', $alltext, -1, PREG_SPLIT_DELIM_CAPTURE); $replace = "MOD"; $n = 2; // every second word $i = ($n - 1) * 2; // $words includes the words *and the spaces*, alternating, because you'll need the spaces when you implode() it back together // [0] is a word, [1] is a space, [2] is a word, and so on // [0], [2], [4], ... is every word ($n=1, 0+2i), // [2], [6], [10], ... is every second word ($n=2, 2+4i), // [4], [10], [16], ... is every third word ($n=3, 4+6i), // or another way, [($n-1)*2 + ($n*2)i] $wordcount = count($words); for ($i = ($n - 1) * 2; $i < $wordcount; $i += $n * 2) { $words[$i] = $replace; } $words = implode('', $words);The preg_replace() version is just as simple but the regex is a bit longer: count out $n-1 words and spaces, capture that, capture the last word, then replace the lot with the capture and your replacement word. The advantage is that you have preg_replace() doing all the work for you. $replace = "MOD"; $n = 2; // every second word $words = preg_replace('/(([\pL\']+[^\pL\']+){' . ($n - 1) . '})[\pL\']+/u', '$1' . preg_quote($replace), $alltext); Thanks. It has a few hickups, but I can do the rest; it selects null characters. Quote Link to comment Share on other sites More sharing options...
maxxd Posted May 21, 2015 Share Posted May 21, 2015 @requinix - good points all, and thank you for the in-depth explanation! Quote Link to comment Share on other sites More sharing options...
requinix Posted May 21, 2015 Share Posted May 21, 2015 Thanks. It has a few hickups, but I can do the rest; it selects null characters.It does... what, exactly? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.