Jump to content

Regex for Anagram Solver


MMDE

Recommended Posts

I've written an anagram solver before. I just don't think the solution I came up with then was very good. Sure it didn't take more than a second to load and it went through a word list of pretty much any English word. This time around I want a very short code that only relies on regex. :)

 

I want to through all words in a word list and check if any of the words are an anagram of a word I input. I can write everything on my own except for the regex. I also want two regex.

 

1st regex:

What I want the regex to match is any string that uses the same letters, the same amount of letters, and each letter the same amount of times, as the input. Order of the letters doesn't matter.

 

Example:

 

Strings:

heert

three

there

tree

tee

heet

teeth

treth

t

 

input:

herte

 

matches:

heert

three

there

 

^ I guess I can just sort the letters in the words and the input and check if they are the same, but I want to learn some regex.

 

2nd regex:

What I want the regex to match is any string that uses the same letters and maximum the same amount of times each letter as the input. Again the order doesn't matter.

 

Example:

 

Strings:

heert

three

there

tree

tee

heet

teeth

treth

t

 

input:

herte

 

matches:

heert

three

there

tree

tee

heet

t

Link to comment
Share on other sites

Is it possible to write a class in regex that limits the amount of times a letter can be used?

 

I kind of realize regex really isn't the most efficient, and that I should write my own algorithm, I just want to learn. Recursion along with permutations this many times might not be a too good idea, but I guess it's not far from what such a regex would have done.

 

I already wrote how to sole the first problem with a simple algorithm;

sort the letters in the input in alphabetical order

loop through each word in wordlist:

if same length, sort the letters in the word in alphabetical order, if same

 

To make this even further more efficient, make a tree out of the words.

first branch would be amount of letters in word.

second branch would be the words sorted after first letter, where first letter is the first letter after the word's letters are sorted in alphabetical order.

then the next could be a balanced binary tree with all the words which belongs there

 

lol

 

I just want to learn some regex, so the first question is perhaps the most essential to learn something from this.

I guess I could write it in a similar way as explained above about the comparison algorithm, obviously the word list would be different:

sort the letters in the input in alphabetical order

loop through each word in wordlist:

sort the letters in the word in alphabetical order

keep a position of where you are in the word you are comparing it to while looping through the input ... urgh this part is getting hard to explain. I think I will just program it! ;o

 

<?php

function solve_anagrams($input, $word_list){
$anagrams = array();
$input = strtolower($input);
$input_array = str_split($input);
sort($input_array);
$input_length = count($input_array);
foreach($word_list AS $word){
	if(count($word)>$input_length){
		continue;
	}
	$anagram = TRUE;
	$word = strtolower($word);
	$word_array = str_split($word);
	sort($word_array);
	$position = 0;
	foreach($word_array AS $character){
		if($position==$input_length){
			$anagram = FALSE;
			break;
		}
		while($character!=$input_array[$position]){
			$position++;
			if($position==$input_length || $character<$input_array[$position]){
				$anagram = FALSE;
				break;
			}
		}
		$position++;
	}
	if($anagram){
		$anagrams[] = $word;
	}
}
return $anagrams;
}

$input = 'herte';

$word_list = array('heert', 'three', 'there', 'tree', 'tee', 'heet', 'teeth', 'treth', 't');

$anagrams = solve_anagrams($input, $word_list);

foreach($anagrams AS $anagram){
echo $anagram."<br/>\n";
}

?>

Could probably have made it more abstract using a class and more functions, but yeah, the main point is there still...

Please improve upon the code if you like! :)

 

In any case, you could say it would look a lot cleaner, maybe not as efficient, with a preg_match instead ;D

Link to comment
Share on other sites

I tried to make it load my wordlist, and now it says Undefined offset: at 23 and 25... not really sure why yet. xD

I mean, before and inside those two places do I check if it will exist.

nevermind, the problem is this line:

if($position==$input_length){

should have been:

if($position>=$input_length){

 

Here's the revised code:

<?php

function solve_anagram($input, $word_list){
$anagrams = array();
$input = strtolower($input);
$input_array = str_split($input);
sort($input_array);
$input_length = count($input_array);
foreach($word_list AS $word){
	if(count($word)>$input_length){
		continue;
	}
	$anagram = TRUE;
	$word = strtolower($word);
	$word_array = str_split($word);
	sort($word_array);
	$position = 0;
	foreach($word_array AS $character){
		if($position>=$input_length){
			$anagram = FALSE;
			break;
		}
		while($character!=$input_array[$position]){
			$position++;
			if($position==$input_length || $character<$input_array[$position]){
				$anagram = FALSE;
				break;
			}
		}
		$position++;
	}
	if($anagram){
		$anagrams[] = $word;
	}
}
return $anagrams;
}

$input = 'herte';

$word_list = explode("\r\n", file_get_contents('wordlist.txt'));

$start_time = microtime(TRUE);
$anagrams = solve_anagram($input, $word_list);
$time = microtime(TRUE)-$start_time;

echo "input: $input<br/>\nanagrams found: ".count($anagrams)."<br/>\nmicroseconds spent: $time<br/>\n";

foreach($anagrams AS $anagram){
echo $anagram."<br/>\n";
}

?>

 

My "word" list contains over 440k words, 92 anagrams found, and it spent less than 3 milliseconds. :)

I know this could have been more effective if the wordlist had been made into a good search tree beforehand.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.