Jump to content

What is the best method?


solarisuser

Recommended Posts

I need to compare a db's results before I add a new entry, and prevent something similar from being added.

For example, I already have "Dell" in the database, and someone wants to enter "Dell Inc". 
I tried using MySQL's LIKE and REGEXP functions but it was not helpful.

My goal is to compare "Dell Inc" to "Dell" and if there is a similarity.

Thanks
Link to comment
Share on other sites

This is challenging in a few ways. If  "Dell Inc" was already in the database, it's easy to match "Dell" in "Dell Inc." The reverse is not--it will never match. The next approach would be to split the string by whitespace and look for each piece, but how do you know which parts are valid to the company name? We know "Inc" can be dropped from "Dell Inc", but what about longer company names such as "Johnson and Johnson"? My only thought at the moment is to make a list of known prefixes and their variations--e.g., Corporation, Corp., Inc., Incorporated, etc.--and strip these from the end of the string before running the match.
Link to comment
Share on other sites

O.K Thanks.  I might just split things by a space, then compare each piece. 

[quote author=effigy link=topic=108075.msg434433#msg434433 date=1158266064]
This is challenging in a few ways. If  "Dell Inc" was already in the database, it's easy to match "Dell" in "Dell Inc." The reverse is not--it will never match. The next approach would be to split the string by whitespace and look for each piece, but how do you know which parts are valid to the company name? We know "Inc" can be dropped from "Dell Inc", but what about longer company names such as "Johnson and Johnson"? My only thought at the moment is to make a list of known prefixes and their variations--e.g., Corporation, Corp., Inc., Incorporated, etc.--and strip these from the end of the string before running the match.
[/quote]
Link to comment
Share on other sites

I have successfully inplemented something quite like this.

You need to compare values of the same length

if DELL is in and you are entered DELL INC, you only want to try to match the first 4 letters and see if they're the same.

you can also use the similar_text function whic takes two values and gives you the percent that is the same, and throw new entries at a sertain threshold.

While DELL and DELL INC are the same, DELL INC and DELL FINANCIAL are not, so throwing the last part will not help.
Link to comment
Share on other sites

Thanks

I can see a few problems with this so I think I'll not do this.

I have "Sun" listed, but not "Sun Microsystems".

[quote author=bholbrook link=topic=108075.msg434482#msg434482 date=1158271780]
I have successfully inplemented something quite like this.

You need to compare values of the same length

if DELL is in and you are entered DELL INC, you only want to try to match the first 4 letters and see if they're the same.

you can also use the similar_text function whic takes two values and gives you the percent that is the same, and throw new entries at a sertain threshold.

While DELL and DELL INC are the same, DELL INC and DELL FINANCIAL are not, so throwing the last part will not help.
[/quote]
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.