Jump to content


Photo

What is the best method?


  • Please log in to reply
4 replies to this topic

#1 solarisuser

solarisuser
  • Members
  • PipPipPip
  • Advanced Member
  • 122 posts

Posted 14 September 2006 - 08:09 PM

I need to compare a db's results before I add a new entry, and prevent something similar from being added.

For example, I already have "Dell" in the database, and someone wants to enter "Dell Inc". 
I tried using MySQL's LIKE and REGEXP functions but it was not helpful.

My goal is to compare "Dell Inc" to "Dell" and if there is a similarity.

Thanks

#2 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 14 September 2006 - 08:34 PM

This is challenging in a few ways. If  "Dell Inc" was already in the database, it's easy to match "Dell" in "Dell Inc." The reverse is not--it will never match. The next approach would be to split the string by whitespace and look for each piece, but how do you know which parts are valid to the company name? We know "Inc" can be dropped from "Dell Inc", but what about longer company names such as "Johnson and Johnson"? My only thought at the moment is to make a list of known prefixes and their variations--e.g., Corporation, Corp., Inc., Incorporated, etc.--and strip these from the end of the string before running the match.
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#3 solarisuser

solarisuser
  • Members
  • PipPipPip
  • Advanced Member
  • 122 posts

Posted 14 September 2006 - 09:04 PM

O.K Thanks.  I might just split things by a space, then compare each piece. 

This is challenging in a few ways. If  "Dell Inc" was already in the database, it's easy to match "Dell" in "Dell Inc." The reverse is not--it will never match. The next approach would be to split the string by whitespace and look for each piece, but how do you know which parts are valid to the company name? We know "Inc" can be dropped from "Dell Inc", but what about longer company names such as "Johnson and Johnson"? My only thought at the moment is to make a list of known prefixes and their variations--e.g., Corporation, Corp., Inc., Incorporated, etc.--and strip these from the end of the string before running the match.



#4 bholbrook

bholbrook
  • Members
  • PipPipPip
  • Advanced Member
  • 31 posts

Posted 14 September 2006 - 10:09 PM

I have successfully inplemented something quite like this.

You need to compare values of the same length

if DELL is in and you are entered DELL INC, you only want to try to match the first 4 letters and see if they're the same.

you can also use the similar_text function whic takes two values and gives you the percent that is the same, and throw new entries at a sertain threshold.

While DELL and DELL INC are the same, DELL INC and DELL FINANCIAL are not, so throwing the last part will not help.

#5 solarisuser

solarisuser
  • Members
  • PipPipPip
  • Advanced Member
  • 122 posts

Posted 15 September 2006 - 12:27 AM

Thanks

I can see a few problems with this so I think I'll not do this.

I have "Sun" listed, but not "Sun Microsystems".

I have successfully inplemented something quite like this.

You need to compare values of the same length

if DELL is in and you are entered DELL INC, you only want to try to match the first 4 letters and see if they're the same.

you can also use the similar_text function whic takes two values and gives you the percent that is the same, and throw new entries at a sertain threshold.

While DELL and DELL INC are the same, DELL INC and DELL FINANCIAL are not, so throwing the last part will not help.






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users