Jump to content

Archived

This topic is now archived and is closed to further replies.

solarisuser

What is the best method?

Recommended Posts

I need to compare a db's results before I add a new entry, and prevent something similar from being added.

For example, I already have "Dell" in the database, and someone wants to enter "Dell Inc". 
I tried using MySQL's LIKE and REGEXP functions but it was not helpful.

My goal is to compare "Dell Inc" to "Dell" and if there is a similarity.

Thanks

Share this post


Link to post
Share on other sites
This is challenging in a few ways. If  "Dell Inc" was already in the database, it's easy to match "Dell" in "Dell Inc." The reverse is not--it will never match. The next approach would be to split the string by whitespace and look for each piece, but how do you know which parts are valid to the company name? We know "Inc" can be dropped from "Dell Inc", but what about longer company names such as "Johnson and Johnson"? My only thought at the moment is to make a list of known prefixes and their variations--e.g., Corporation, Corp., Inc., Incorporated, etc.--and strip these from the end of the string before running the match.

Share this post


Link to post
Share on other sites
O.K Thanks.  I might just split things by a space, then compare each piece. 

[quote author=effigy link=topic=108075.msg434433#msg434433 date=1158266064]
This is challenging in a few ways. If  "Dell Inc" was already in the database, it's easy to match "Dell" in "Dell Inc." The reverse is not--it will never match. The next approach would be to split the string by whitespace and look for each piece, but how do you know which parts are valid to the company name? We know "Inc" can be dropped from "Dell Inc", but what about longer company names such as "Johnson and Johnson"? My only thought at the moment is to make a list of known prefixes and their variations--e.g., Corporation, Corp., Inc., Incorporated, etc.--and strip these from the end of the string before running the match.
[/quote]

Share this post


Link to post
Share on other sites
I have successfully inplemented something quite like this.

You need to compare values of the same length

if DELL is in and you are entered DELL INC, you only want to try to match the first 4 letters and see if they're the same.

you can also use the similar_text function whic takes two values and gives you the percent that is the same, and throw new entries at a sertain threshold.

While DELL and DELL INC are the same, DELL INC and DELL FINANCIAL are not, so throwing the last part will not help.

Share this post


Link to post
Share on other sites
Thanks

I can see a few problems with this so I think I'll not do this.

I have "Sun" listed, but not "Sun Microsystems".

[quote author=bholbrook link=topic=108075.msg434482#msg434482 date=1158271780]
I have successfully inplemented something quite like this.

You need to compare values of the same length

if DELL is in and you are entered DELL INC, you only want to try to match the first 4 letters and see if they're the same.

you can also use the similar_text function whic takes two values and gives you the percent that is the same, and throw new entries at a sertain threshold.

While DELL and DELL INC are the same, DELL INC and DELL FINANCIAL are not, so throwing the last part will not help.
[/quote]

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.