Jump to content

levenshtein function help


chordsoflife

Recommended Posts

I'm working on a "Did you mean" feature for my site, and it's working 100%. My problem is that I don't really understand what's going on.. I only managed to implement it. So what I'd like to do now is limit the suggestions unless it actually has a half decent one, but I'm not sure how to do that.

 

Here's my code so far:

$bands = q("SELECT fldBand, fldConcertDate FROM tblConcert WHERE fldConcertDate >= '$date'");
		$shortest = -1;

		// loop through words to find the closest
		foreach ($bands as $band) {
			extract($band);
			$band = $band[fldBand];
			$lev = levenshtein($search_for, $band);
			if ($lev == 0) {
				$closest = $band;
				$shortest = 0;
				break;
			}
			if ($lev <= $shortest || $shortest < 0) {
				$closest  = $band;
				$shortest = $lev;
			}
		}

            echo "<p>Sorry, upcoming shows for <b>$search_for</b> could not be found. ";
			if ($shortest != 0) {
				echo "Did you mean: <a href=\"?fldArtist=" . str_replace(" ", "+", $closest) . "\">$closest</a>?</p>";
			}
		echo "<p>Do you know of a show that we don't have listed? <a href=\"contact.php\">Contact us</a>!</p>\n";

 

Thanks!

Link to comment
https://forums.phpfreaks.com/topic/134359-levenshtein-function-help/
Share on other sites

Yea, it's essentially the "carrrots" to "carrot" example from the php website. If I asked how to create that functionality I'd get a link there anyway  ;)

 

I understand how it works and what it does, I'm just not sure what the actual numbers are saying.

I understand how it works and what it does, I'm just not sure what the actual numbers are saying.

 

That makes no sense.  If you understand how it works and what it does, then...well, what's left to understand?  So what exactly do you mean by "actual numbers"

Sorry, I don't think I'm being unclear, but I'll try to explain.

 

I understand what the function does. I understand the point of using it. What I don't understand is what the numbers in the function are, or rather, what the effect of changing them would be. I'm talking about the manual example, but a response applying it to mine would be great.

 

The numbers, if not clear, would be for the $shortest varable set to -1, and 0 for the if $lev == 0

 

THen, my original question:

"So what I'd like to do now is limit the suggestions unless it actually has a half decent one, but I'm not sure how to do that."

Well what I'm saying is that if you say you understand what the function does, then you should understand the numbers.  -1, as the comment in the code says, is a flag to specify that no matches are found yet.  0 means you found a 100% match, >1 means the levenshtein distance between $string1 and $string2.  The description in the manual tells you

 

The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform str1  into str2 .

 

So for instance:

 

cat vs. cat : nothing has to be changed, so the distance is 0. 0 == match.

rat vs. cat : only 1 character has to be replaced to change rat to cat, so the distance would be 1. 

rats vs. cat : 1 character has to be changed (r to c) and one char has to be removed (s), so distance would be 2.

dog vs. cat : all 3 characters in dog have to be changed to make cat, so the distance would be 3.

 

So, the smaller the number, the closer a match there will be.  It's up to you what's the maximum number returned to accept as an acceptable suggestion.  You just have to look at the average length chars for your data.

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.