solarisuser Posted September 14, 2006 Share Posted September 14, 2006 I need to compare a db's results before I add a new entry, and prevent something similar from being added.For example, I already have "Dell" in the database, and someone wants to enter "Dell Inc". I tried using MySQL's LIKE and REGEXP functions but it was not helpful.My goal is to compare "Dell Inc" to "Dell" and if there is a similarity.Thanks Quote Link to comment Share on other sites More sharing options...
effigy Posted September 14, 2006 Share Posted September 14, 2006 This is challenging in a few ways. If "Dell Inc" was already in the database, it's easy to match "Dell" in "Dell Inc." The reverse is not--it will never match. The next approach would be to split the string by whitespace and look for each piece, but how do you know which parts are valid to the company name? We know "Inc" can be dropped from "Dell Inc", but what about longer company names such as "Johnson and Johnson"? My only thought at the moment is to make a list of known prefixes and their variations--e.g., Corporation, Corp., Inc., Incorporated, etc.--and strip these from the end of the string before running the match. Quote Link to comment Share on other sites More sharing options...
solarisuser Posted September 14, 2006 Author Share Posted September 14, 2006 O.K Thanks. I might just split things by a space, then compare each piece. [quote author=effigy link=topic=108075.msg434433#msg434433 date=1158266064]This is challenging in a few ways. If "Dell Inc" was already in the database, it's easy to match "Dell" in "Dell Inc." The reverse is not--it will never match. The next approach would be to split the string by whitespace and look for each piece, but how do you know which parts are valid to the company name? We know "Inc" can be dropped from "Dell Inc", but what about longer company names such as "Johnson and Johnson"? My only thought at the moment is to make a list of known prefixes and their variations--e.g., Corporation, Corp., Inc., Incorporated, etc.--and strip these from the end of the string before running the match.[/quote] Quote Link to comment Share on other sites More sharing options...
bholbrook Posted September 14, 2006 Share Posted September 14, 2006 I have successfully inplemented something quite like this.You need to compare values of the same lengthif DELL is in and you are entered DELL INC, you only want to try to match the first 4 letters and see if they're the same.you can also use the similar_text function whic takes two values and gives you the percent that is the same, and throw new entries at a sertain threshold.While DELL and DELL INC are the same, DELL INC and DELL FINANCIAL are not, so throwing the last part will not help. Quote Link to comment Share on other sites More sharing options...
solarisuser Posted September 15, 2006 Author Share Posted September 15, 2006 ThanksI can see a few problems with this so I think I'll not do this.I have "Sun" listed, but not "Sun Microsystems".[quote author=bholbrook link=topic=108075.msg434482#msg434482 date=1158271780]I have successfully inplemented something quite like this.You need to compare values of the same lengthif DELL is in and you are entered DELL INC, you only want to try to match the first 4 letters and see if they're the same.you can also use the similar_text function whic takes two values and gives you the percent that is the same, and throw new entries at a sertain threshold.While DELL and DELL INC are the same, DELL INC and DELL FINANCIAL are not, so throwing the last part will not help.[/quote] Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.