Jump to content


Photo

need help checking text similarity in php


  • Please log in to reply
3 replies to this topic

#1 oracle259

oracle259
  • Members
  • PipPipPip
  • Advanced Member
  • 119 posts

Posted 01 October 2006 - 08:59 PM

I think the similar_text() function is very inaccurate and not very useful. Do you know if anyone has written a script that improves on it or has a new script that is more accurate.


Thanks

#2 Kris

Kris
  • Staff Alumni
  • Advanced Member
  • 2,755 posts
  • LocationThe Internet

Posted 01 October 2006 - 09:03 PM

Have you tried the levenshtein() function instead of similar_text()?

#3 oracle259

oracle259
  • Members
  • PipPipPip
  • Advanced Member
  • 119 posts

Posted 01 October 2006 - 09:04 PM

Yup but it seems to do nothing more than return the strlen($word1) - strlen($word2). Which doesnt do much

#4 Kris

Kris
  • Staff Alumni
  • Advanced Member
  • 2,755 posts
  • LocationThe Internet

Posted 02 October 2006 - 06:35 AM

Yup but it seems to do nothing more than return the strlen($word1) - strlen($word2). Which doesnt do much

Not quite. The function returns the Levenshtein-Distance between two strings. The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform string 1 into string 2.

The example from the manual is quite useful as a starting point:
<?php
// input misspelled word
$input = 'carrrot';

// array of words to check against
$words  = array('apple','pineapple','banana','orange',
               'radish','carrot','pea','bean','potato');

// no shortest distance found, yet
$shortest = -1;

// loop through words to find the closest
foreach ($words as $word) {

   // calculate the distance between the input word,
   // and the current word
   $lev = levenshtein($input, $word);

   // check for an exact match
   if ($lev == 0) {

       // closest word is this one (exact match)
       $closest = $word;
       $shortest = 0;

       // break out of the loop; we've found an exact match
       break;
   }

   // if this distance is less than the next found shortest
   // distance, OR if a next shortest word has not yet been found
   if ($lev <= $shortest || $shortest < 0) {
       // set the closest match, and shortest distance
       $closest  = $word;
       $shortest = $lev;
   }
}

echo "Input word: $input\n";
if ($shortest == 0) {
   echo "Exact match found: $closest\n";
} else {
   echo "Did you mean: $closest?\n";
}

?>





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users