Jump to content

Misspelling Generator


rodrico101

Recommended Posts

Hello,

 

I am trying to write  a script that generates misspelled words to find in searches.  (ie search auction sites for bargains)

 

I want to search by simple typos (ie Beatles => ebatles, beatels etc)

and also by letters close to them on the keyboard (ie B => v,n.g.h.f etc)

 

Not quite sure where to start.  Once all the words are generated, I want to put them altogether and send to a web search

 

Any help would be appreciated.

 

Rod

Link to comment
https://forums.phpfreaks.com/topic/120268-misspelling-generator/
Share on other sites

I think there isn't any automated technique for achieving this. You may have a dictionary of common misspelled words and search within them. Or on reverse, have a dictionary of common words and when a user searches for "ebatles", scramble it until if finds a word in the database. It sounds dull but I have no smart ideas for this.

I have a system that generate regex's to catch intentionaly misspelled words, this might help you get started.

<?php
$pattern['a'] = '/[a]/'; $replace['a'] = '[a A @]';
$pattern['b'] = '/[b]/'; $replace['b'] = '[b B I3 l3 i3]';
$pattern['c'] = '/[c]/'; $replace['c'] = '(?:[c C (]|[k K])';
$pattern['d'] = '/[d]/'; $replace['d'] = '[d D]';
$pattern['e'] = '/[e]/'; $replace['e'] = '[e E 3]';
$pattern['f'] = '/[f]/'; $replace['f'] = '(?:[f F]|[ph pH Ph PH])';
$pattern['g'] = '/[g]/'; $replace['g'] = '[g G]';
$pattern['h'] = '/[h]/'; $replace['h'] = '[h H]';
$pattern['i'] = '/[i]/'; $replace['i'] = '[i I l ! 1]';
$pattern['j'] = '/[j]/'; $replace['j'] = '[j J]';
$pattern['k'] = '/[k]/'; $replace['k'] = '(?:[c C (]|[k K])';
$pattern['l'] = '/[l]/'; $replace['l'] = '[l L 1 ! i]';
$pattern['m'] = '/[m]/'; $replace['m'] = '[m M]';
$pattern['n'] = '/[n]/'; $replace['n'] = '[n N]';
$pattern['o'] = '/[o]/'; $replace['o'] = '[o O 0]';
$pattern['p'] = '/[p]/'; $replace['p'] = '[p P]';
$pattern['q'] = '/[q]/'; $replace['q'] = '[q Q]';
$pattern['r'] = '/[r]/'; $replace['r'] = '[r R]';
$pattern['s'] = '/[s]/'; $replace['s'] = '[s S $ 5]';
$pattern['t'] = '/[t]/'; $replace['t'] = '[t T 7]';
$pattern['u'] = '/[u]/'; $replace['u'] = '[u U v V]';
$pattern['v'] = '/[v]/'; $replace['v'] = '[v V u U]';
$pattern['w'] = '/[w]/'; $replace['w'] = '[w W vv VV]';
$pattern['x'] = '/[x]/'; $replace['x'] = '[x X]';
$pattern['y'] = '/[y]/'; $replace['y'] = '[y Y]';
$pattern['z'] = '/[z]/'; $replace['z'] = '[z Z 2]';
$word = str_split(strtolower($_POST['word']));
$i=0;
while($i < count($word))
 {
 	if(!is_numeric($word[$i]))
	 {
	 	if($word[$i] != ' ' || count($word[$i]) < '1')
	 	 {
			$word[$i] = preg_replace($pattern[$word[$i]], $replace[$word[$i]], $word[$i]);
		 }
	 }
	$i++;
 }
//$word = "/" . implode('', $word) . "/";
echo implode('', $word);

This may not work for you but you could use a function like levenshtein (http://www.php.net/manual/en/function.levenshtein.php) it will tell you how close two words are together, this would be used on the fly by comparing the search to the database of items, like this:

<?php
        $short = -1; //how close of a match it is
while($row = mysql_fetch_assoc($result)) {
	$lev = levenshtein(strtolower($_GET['word']), strtolower($row['name']), 1, 1, 1);
	if ($lev == 0) { //if there is a perfect match
		$close = $word; 
		$short = 0;
		$id = $row['id'];
		break;
	}
	if ($lev <= $short || $short < 0) { //if there has been no match or if the current word is a better match
		$close = $word;
		$ans = $row['name'];
		$id = $row['id'];
		$short = $lev;
	}
}
?>

 

That code would take a mysql query result and output the closest match to the input ($_GET['word']).

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.