opencombatclan Posted November 18, 2009 Share Posted November 18, 2009 Hello everyone. I have a rather difficult problem to solve, and I was wondering if someone of you could think with me. Lets say I have the following string shdjssueeekdssdeewsdwkdskdskdsskadkdssdsadkdssadkds29skds I want to make an array out of this string which contains how often a character combination, anything larger as 2 characters, occurs in my given string. So the array will be like: Array ( ['sh'] => '1', ['ee'] => '2', ['kds'] = '8' ) and so one To stop the array from growing really big, I want only the character-combinations that occur more as 8 times in my string in the array. I hope its clear, and that someone can help me out Quote Link to comment https://forums.phpfreaks.com/topic/181996-most-common-string-in-string/ Share on other sites More sharing options...
salathe Posted November 18, 2009 Share Posted November 18, 2009 Do you want to count overlapping character-combinations? For example, given "eeee" does "ee" occur only twice or three times? Quote Link to comment https://forums.phpfreaks.com/topic/181996-most-common-string-in-string/#findComment-959976 Share on other sites More sharing options...
opencombatclan Posted November 18, 2009 Author Share Posted November 18, 2009 No overlapping Quote Link to comment https://forums.phpfreaks.com/topic/181996-most-common-string-in-string/#findComment-960183 Share on other sites More sharing options...
salathe Posted November 18, 2009 Share Posted November 18, 2009 Here's one way, the process is to basically loop over the different sizes of character-combinations (from 2 to half of the subject string [you can't repeat more than half of the string without overlapping!]) and for each of those sizes loop over the different character-combinations of that length within the string. We can count the number of non-overlapping times that a chunk of the strings repeats (e.g. "kds" appears 8 times) with substr_count(). $subject = 'shdjssueeekdssdeewsdwkdskdskdsskadkdssdsadkdssadkds29skds'; $min_num = 8; $all_len = strlen($subject); // For each chunk length from 2 to half-of-string-length for ($i = 2, $max_len = floor($all_len / 2); $i <= $max_len; $i++) { // For each available offset in the subject string for ($offset = 0, $max_offset = $all_len - $i; $offset < $max_offset; $offset++) { // Get the chunk and count how often it occurs in the subject $chunk = substr($subject, $offset, $i); $count = substr_count($subject, $chunk); // Only record this chunk if it occurs at least $min_num times if ($count >= $min_num) { $repeats[$chunk] = $count; } } } // Sort values in decreasing order arsort($repeats); var_dump($repeats); /* array(3) { ["ds"]=> int(9) ["kds"]=> int( ["kd"]=> int( } */ Quote Link to comment https://forums.phpfreaks.com/topic/181996-most-common-string-in-string/#findComment-960218 Share on other sites More sharing options...
opencombatclan Posted November 18, 2009 Author Share Posted November 18, 2009 Thank you very much for your help. This indeed works for smaller strings. However, some bigger and more complicated strings take a huge time to process. Is there any way we can speed up this process? Maybe using reg expressions? I found a similar question here: http://stackoverflow.com/questions/1182208/find-longest-repeating-strings But the given example does not work (it only outputs the word Array). Any ideas? Quote Link to comment https://forums.phpfreaks.com/topic/181996-most-common-string-in-string/#findComment-960318 Share on other sites More sharing options...
salathe Posted November 18, 2009 Share Posted November 18, 2009 Looking through lots and lots of string combinations will take a "huge time" to process, a lot of function calls are happening: for the example string above substr_count and substr will both be called 1134 times. A regex approach might be worth considering but might end up being a nightmare. Quote Link to comment https://forums.phpfreaks.com/topic/181996-most-common-string-in-string/#findComment-960459 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.