Jump to content

percentage difference between strings


joe92

Recommended Posts

Hmm, not quite sure how to do this one...

 

If i have lots of strings saved in a mysql table and one was...

"A man and a dog took a nice walk in the park"

 

If a user then wanted to input another string into the table, but i wanted to check the new string was at least 5% different to all other in the table, how would i about doing this?

 

Is it possible to do this via mysql, or would i have to pull out all the strings into a php array and process it that way... somehow?

 

Link to comment
https://forums.phpfreaks.com/topic/229176-percentage-difference-between-strings/
Share on other sites

First, you will have to define "different"

a MAN AND A DOG TOOK A NICE WALK IN THE PARK

that string is 100% different from the one you supplied ... or it is 100% the same. Or is it somewhere in between?

 

 

Once you have defined "different", you have to decide how to quantify the difference. To calculate a percentage, you have to be able to count the "differences" and divide.

 

 

After you have defined and quantified "different", then we can answer the question as to whether or not we can calculate and test the amount of difference using SQL.

 

There is an algorithm called levenshtein which does exactly this... somewhat.

 

What it does is give you "the minimal number of characters you have to replace, insert or delete to transform str1 into str2".  Once you have this number, you can then compare it with the actual length of the string.

 

So for instance,  the levenshtein between "I eat food" and "I ate food" would be 3 (I believe).  You would then take that number and divide it by the length of "I eat food".. AKA str1. 

3/10 = 30-33% different.  At least that's how I'd do it, I'm sure there are better ways.

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.