TheBurtle Posted December 23, 2009 Share Posted December 23, 2009 Here's a riddle for PHP coders out there... I want to take a string, and find patterns. RegExp, you say? Exactly, but it gets a little more complex than that. You see, I don't know what I am looking for. Neither will the code. The only thing we know is that we want to find the largest "patterns" in a file, and replace it with a number. Here's an example of what I am talking about, "pseudocode" style: Text to be scanned: God is great, God is good, Let us thank him for our food. Scan complete. Largest pattern: "God is g" Found twice, replaced with "1", noted and removed. Now we have: 1reat, 1ood, Let us thank him for our food. Second scan complete. Largest pattern: "ood" Found twice, replaced with "2", noted and removed. Now we have: 1reat, 12, Let us thank him for our f2. Third scan complete. Largest pattern: "r_" (Underscore is whitespace) Found twice, noted and removed. Now we have: 1reat, 12, Let us thank him fo3ou3f. No more patterns of two or greater. Therefore, pattern matching is over. Now, imagine this on a work like Hamlet. I think you see what I am saying. Thanks in advance! Quote Link to comment https://forums.phpfreaks.com/topic/186183-patterns-but-not-the-norm/ Share on other sites More sharing options...
jonsjava Posted December 23, 2009 Share Posted December 23, 2009 I don't usually help unless you provide code, but this seemed like a fun challenge. Here's what I got to work. It's ugly, and it won't do what you think it will. Remember: the script does not know actual words, only length. You can fix it from there. If you need help, just post your questions here. <?php print_r(getpatterns("God is great, God is good, Let us thank him for our food.",3)); /** * Finds patterns with a minimum length of $minlen * * @param string $string * @param int $minlen * @return array */ function getpatterns($string, $minlen=1){ $string = strtolower($string); $maxcount = strlen($string); $array = array(); $count = $minlen; while ($count <= $maxcount){ $tmp_array = str_split($string,$count); foreach ($tmp_array as $val){ $array[$val] = substr_count($string,$val); } $count++; } arsort($array); return $array; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/186183-patterns-but-not-the-norm/#findComment-983316 Share on other sites More sharing options...
TheBurtle Posted December 23, 2009 Author Share Posted December 23, 2009 Absolutely great! Thanks a ton. You're right, isn't what I expected, but is that great big nudge in the right direction. Quote Link to comment https://forums.phpfreaks.com/topic/186183-patterns-but-not-the-norm/#findComment-983318 Share on other sites More sharing options...
laffin Posted December 23, 2009 Share Posted December 23, 2009 Its quite simple, but ya have to have some wits to it Original: God is great, God is good, Let us thank him for our food. Compressed: \0reat\1\0\2\1Let us thank him\3or our\3\2. #0=>'God is g' #1=>', ' #2=>'ood' #3=>' f' As shown this was output of the code i had created, but this sounds more like a school project than a riddle. To achieve this I use a sliding window mechanism. grab first portion of text with min window, compare against rest of string if a match is found, store the result, increase the window, and try the match again increment the first portion offset and repeat Quote Link to comment https://forums.phpfreaks.com/topic/186183-patterns-but-not-the-norm/#findComment-983321 Share on other sites More sharing options...
TheBurtle Posted December 23, 2009 Author Share Posted December 23, 2009 Unfortunately, at 38, no longer school worthy. It was a little more of an idle curiosity that I had never been able to figure out. I would start, then over-complicate the code, get frustrated, and walk away from it for a year or so until I remembered it again. It's not even like it's necessarily usable. I guess you could call it a poor man's text compression program. But, hey, thanks again, jonsjava, for the point in the right direction. Quote Link to comment https://forums.phpfreaks.com/topic/186183-patterns-but-not-the-norm/#findComment-983325 Share on other sites More sharing options...
russthebarber Posted December 23, 2009 Share Posted December 23, 2009 I have another approach which is not exactly what you are looking for but might get you going in another direction. 1. make arrays of 10-word sequences. 2. then look for anywhere where array1(1,2,3) is the same as either 1,2,3 or 2,3,4 or 3,4,5 etc in another array. does that make sense? i am more clued up with mysql than php so i would dump the whole thing into a db and worlk with the data from there. Quote Link to comment https://forums.phpfreaks.com/topic/186183-patterns-but-not-the-norm/#findComment-983330 Share on other sites More sharing options...
laffin Posted December 23, 2009 Share Posted December 23, 2009 38 not too bad here is what I used <?php $quote=$text="God is great, God is good, Let us thank him for our food."; $minwin=2; $pos=$cnt=0; $tlen=strlen($text); while($pos<($tlen-$minwin)) { $pos2=$pos+$minwin; $winsize=$minwin; $matchsize=0; while(($pos2+$winsize)<$tlen) { while (substr($text,$pos,$winsize)==substr($text,$pos2,$winsize)) { if($winsize>$matchsize) { $matchpos=$pos; $matchsize=$winsize; } $winsize++; } $pos2++; } if($matchsize) { $match=substr($text,$pos,$matchsize); $matches[$cnt]=$match; $text=str_replace($match,"\\{$cnt}",$text); $pos++; $cnt++; $tlen=strlen($text); } $pos++; } // } echo "Original: {$quote}<br />\n"; echo "Compressed: {$text}<br />\n"; foreach($matches as $key=>$val) { echo " #{$key}=>'{$val}'<br />\n"; } ?> Although some code can be added (returning the biggest array first, not incorporated yet) its almost there reason I used \0 instead of just 1,2,3,4, is so you can find the replacements quickly, but Im shure ya can use other delimeter marks Quote Link to comment https://forums.phpfreaks.com/topic/186183-patterns-but-not-the-norm/#findComment-983331 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.