MMDE Posted August 19, 2012 Share Posted August 19, 2012 I've written an anagram solver before. I just don't think the solution I came up with then was very good. Sure it didn't take more than a second to load and it went through a word list of pretty much any English word. This time around I want a very short code that only relies on regex. I want to through all words in a word list and check if any of the words are an anagram of a word I input. I can write everything on my own except for the regex. I also want two regex. 1st regex: What I want the regex to match is any string that uses the same letters, the same amount of letters, and each letter the same amount of times, as the input. Order of the letters doesn't matter. Example: Strings: heert three there tree tee heet teeth treth t input: herte matches: heert three there ^ I guess I can just sort the letters in the words and the input and check if they are the same, but I want to learn some regex. 2nd regex: What I want the regex to match is any string that uses the same letters and maximum the same amount of times each letter as the input. Again the order doesn't matter. Example: Strings: heert three there tree tee heet teeth treth t input: herte matches: heert three there tree tee heet t Quote Link to comment https://forums.phpfreaks.com/topic/267318-regex-for-anagram-solver/ Share on other sites More sharing options...
requinix Posted August 20, 2012 Share Posted August 20, 2012 Regular expressions can't do this in any remotely reasonable way. Choose a different hammer. Quote Link to comment https://forums.phpfreaks.com/topic/267318-regex-for-anagram-solver/#findComment-1370749 Share on other sites More sharing options...
xyph Posted August 20, 2012 Share Posted August 20, 2012 Check out recursion and permutations. Quote Link to comment https://forums.phpfreaks.com/topic/267318-regex-for-anagram-solver/#findComment-1370750 Share on other sites More sharing options...
MMDE Posted August 20, 2012 Author Share Posted August 20, 2012 Is it possible to write a class in regex that limits the amount of times a letter can be used? I kind of realize regex really isn't the most efficient, and that I should write my own algorithm, I just want to learn. Recursion along with permutations this many times might not be a too good idea, but I guess it's not far from what such a regex would have done. I already wrote how to sole the first problem with a simple algorithm; sort the letters in the input in alphabetical order loop through each word in wordlist: if same length, sort the letters in the word in alphabetical order, if same To make this even further more efficient, make a tree out of the words. first branch would be amount of letters in word. second branch would be the words sorted after first letter, where first letter is the first letter after the word's letters are sorted in alphabetical order. then the next could be a balanced binary tree with all the words which belongs there lol I just want to learn some regex, so the first question is perhaps the most essential to learn something from this. I guess I could write it in a similar way as explained above about the comparison algorithm, obviously the word list would be different: sort the letters in the input in alphabetical order loop through each word in wordlist: sort the letters in the word in alphabetical order keep a position of where you are in the word you are comparing it to while looping through the input ... urgh this part is getting hard to explain. I think I will just program it! ;o <?php function solve_anagrams($input, $word_list){ $anagrams = array(); $input = strtolower($input); $input_array = str_split($input); sort($input_array); $input_length = count($input_array); foreach($word_list AS $word){ if(count($word)>$input_length){ continue; } $anagram = TRUE; $word = strtolower($word); $word_array = str_split($word); sort($word_array); $position = 0; foreach($word_array AS $character){ if($position==$input_length){ $anagram = FALSE; break; } while($character!=$input_array[$position]){ $position++; if($position==$input_length || $character<$input_array[$position]){ $anagram = FALSE; break; } } $position++; } if($anagram){ $anagrams[] = $word; } } return $anagrams; } $input = 'herte'; $word_list = array('heert', 'three', 'there', 'tree', 'tee', 'heet', 'teeth', 'treth', 't'); $anagrams = solve_anagrams($input, $word_list); foreach($anagrams AS $anagram){ echo $anagram."<br/>\n"; } ?> Could probably have made it more abstract using a class and more functions, but yeah, the main point is there still... Please improve upon the code if you like! In any case, you could say it would look a lot cleaner, maybe not as efficient, with a preg_match instead Quote Link to comment https://forums.phpfreaks.com/topic/267318-regex-for-anagram-solver/#findComment-1370801 Share on other sites More sharing options...
MMDE Posted August 20, 2012 Author Share Posted August 20, 2012 I tried to make it load my wordlist, and now it says Undefined offset: at 23 and 25... not really sure why yet. xD I mean, before and inside those two places do I check if it will exist. nevermind, the problem is this line: if($position==$input_length){ should have been: if($position>=$input_length){ Here's the revised code: <?php function solve_anagram($input, $word_list){ $anagrams = array(); $input = strtolower($input); $input_array = str_split($input); sort($input_array); $input_length = count($input_array); foreach($word_list AS $word){ if(count($word)>$input_length){ continue; } $anagram = TRUE; $word = strtolower($word); $word_array = str_split($word); sort($word_array); $position = 0; foreach($word_array AS $character){ if($position>=$input_length){ $anagram = FALSE; break; } while($character!=$input_array[$position]){ $position++; if($position==$input_length || $character<$input_array[$position]){ $anagram = FALSE; break; } } $position++; } if($anagram){ $anagrams[] = $word; } } return $anagrams; } $input = 'herte'; $word_list = explode("\r\n", file_get_contents('wordlist.txt')); $start_time = microtime(TRUE); $anagrams = solve_anagram($input, $word_list); $time = microtime(TRUE)-$start_time; echo "input: $input<br/>\nanagrams found: ".count($anagrams)."<br/>\nmicroseconds spent: $time<br/>\n"; foreach($anagrams AS $anagram){ echo $anagram."<br/>\n"; } ?> My "word" list contains over 440k words, 92 anagrams found, and it spent less than 3 milliseconds. I know this could have been more effective if the wordlist had been made into a good search tree beforehand. Quote Link to comment https://forums.phpfreaks.com/topic/267318-regex-for-anagram-solver/#findComment-1370811 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.