PHP script to remove duplicate sql entries

dombrorj · November 5, 2010

I have a database that collects visitor's info and am trying to figure out how I can execute a php script that will search and remove any duplicate entries.

I have table 1 that has a list of unique usernames. Table 2 adds usernames that may or may not be in table 1. I need to keep all usernames in table 1, but delete any that also appear in table 2.

There are also some instances where the username is added twice to table 2, but not included in table 1. In that case, I need to remove the duplicate entries.

I'm really not sure where to begin with this and hoping someone will be willing to to help me out (newbie alert!) Thanks!

OldWest · November 5, 2010

What format is the database? MySQL? Access? etc.. Or is it a spreadsheet like Excel?

dombrorj · November 5, 2010

Sorry... it's a MySQL database.

OldWest · November 5, 2010

Are you using phpMyAdmin to view the database entries?

dombrorj · November 5, 2010

Yep, I'm looking at the tables in phpmyadmin now scratching my head.

dombrorj · November 5, 2010

I'm realizing I probably wasn't clear enough earlier so I'll try again.

And Table 1...

Index | username

Table 2 looks like this...

id | username | email

So I don't want to remove anything from table 1, but

if the username is in 1 and in 2, remove username from table 2.
if multiple instances of username is in table 2 remove the duplicates and leave one entry
if multiple instances of email is in table 2, remove the duplicates but leave one entry

I was thinking a php script so I can set a cron to run it periodically and clean up the duplicates. Let me know if I left out any other details.

Thanks!

The Little Guy · November 5, 2010

I have used this code before:

delete from table1
USING table1, table1 as vtable
WHERE (NOT table1.ID=vtable.ID)
AND (table1.field_name=vtable.field_name)

OldWest · November 5, 2010

There is a tab in phpMyAdmin titled "SQL". Click on that. There will be a query in the textbox. Delete that and copy in the code that "The Little Guy" posted AT YOUR OWN RISK! ... But that's the idea.. Make sure you have a complete backup of your database and all table data etc before running any SQL on your stuff.. You can make a complete backup under tab "EXPORT" - click the checkbox "Save file as" and then run the export by clicking "Go".. SERIOUSLY USE THIS AT YOUR OWN RISK. YOU COULD LOSE ALL OF YOUR DATA IF THIS IS NOT DONE CORRECTLY! YOU'VE BEEN WARNED!

OldWest · November 5, 2010

You will need to modify the code The Little Guy posted as well based on your table names etc.. And probably need to tweak that around some as well to do exactly what you need.

YOU SHOULD SERIOUSLY TAKE THIS POST TO THE SQL FORUM TO VERIFY THE EXACT SQL YOU SHOULD USE TO RUN ON YOUR TABLES.

dombrorj · November 5, 2010

Perfect! Backup complete, and now i'll give the code a shot.

Thanks OldWest and Little Guy. Much appreciated.

dombrorj · November 5, 2010

I posted a similar thread in the php forum thinking I needed a php script, but was recommended to ask this question over here.

I have a SQL database that collects visitor's info and am trying to figure out how to search and remove any duplicate entries. It's a little tricky because I some duplicates are may be on two seperate tables. Here's what's going on...

Table 1 Looks Like This...

Index | username

Table 2 looks like this...

id | username | email

I don't want to remove anything from table 1 ever, but

* if the username is in 1 and in 2, remove username from table 2.

* if multiple instances of username is in table 2 remove the duplicates and leave one entry (if its not in table 1)

* if multiple instances of email is in table 2, remove the duplicates but leave one entry

The Little Guy suggested this code to tweak, but can use some assistance with that.

delete from table1
USING table1, table1 as vtable
WHERE (NOT table1.ID=vtable.ID)
AND (table1.field_name=vtable.field_name)

I was originaly thinking I needed a php script so I can set a cron to run a process periodically to clean up the duplicates, but I think I first need to figure out how to do this from within phpmyadmin. Let me know if I left out any other details.

I'm kinda at a loss here, and any help is really appreciated.

Thanks!

akitchin · November 5, 2010

well, the first query should be pretty simple. you can just delete from the second table where any username is in the entire first table:

DELETE FROM table2 WHERE username IN (SELECT DISTINCT username FROM table1)

here is a link containing some code that will help you figure out how to remove similar records (that is, records containing a duplicate value in one column):

http://www.cryer.co.uk/brian/sql/sql_delete_duplicates.htm#DeleteSimilarRecords

the code he gives is to delete all similar records, while retaining the one with the lowest `uniqueField`. that's something you can change as per your requirements, depending on which record you want to keep in case of duplicate values.

you can run the first query, followed by the second and third in whichever order you choose to. hope this helps.

dombrorj · November 5, 2010

Hey akitchin,

Thanks for the help and for pointing me to the site explaining how to remove similar values. That was helpful, but need just a little bit more help...

I did exactly as you have here for the first query, making sure the tables and labels were accurate, and it didn't seem to work. I just received message: "0 row(s) deleted" even though duplicates between the two tables do exist. Any thoughts on what may have gone wrong here?

DELETE FROM table2 WHERE username IN (SELECT DISTINCT username FROM table1)

As for the Little Gut code, his code worked except I'm realizing now it has to be a bit more specific. The table looks like this:

id | username | email

In some instances, the 'username' field is blank. So the code given deletes all of those lines. The other issue is that it deletes the entire row. But, if it has a value for 'email' I want to keep that email information there (so just remove 'username' from the row). If there is no value for 'email' then it can delete the row.

Here's the code again from The Little Guy that is deleting the row entirely, including blank fields:

delete from table1
USING table1, table1 as vtable
WHERE (NOT table1.ID=vtable.ID)
AND (table1.username=vtable.username)

Thanks again!

Pikachu2000 · November 5, 2010

I've merged the two essentially identical topics into this one thread.

akitchin · November 5, 2010

regarding the first query (to delete duplicates from table2 that exist in table1): are you certain that the "username" columns are identical? is there a spacing difference (e.g. does one consistently have a leading or trailing space)?

as for the second issue of duplicates WITHIN table2, i think some example data would help here. are you saying you want to keep any rows with duplicate usernames but non-empty email fields, while setting the duplicate username values to NULL and vice versa? if so, you may want to reconsider how you've designed your table...

EDIT: thanks pikachu, i (wrongly) assumed the other thread was deleted.

dombrorj · November 5, 2010

Thanks for merging Pikachu2000. Didn't meant to try to scream for attention with 2 threads. It was just suggested to me to start a new thread in the other forum.

akitchin, I double checked both tables and the labels are identical. Alll lowercase, no extra spaces. The only difference is that table 1 has columns and table 2 has 3 columns.

For the second issue, yeah... I agree it's probably a design flaw from the programmer. Here's some sample data. Here's some example data:

Table 1 (don't want to delete anything here. All unique values that were imported):

index | username

1 | dave

2 | john

3 | steve

4 | jim

5 | rob

Table 2 can look like this...

index | username | email

1 | dave | (blank)

.

35 | dave | [email protected]

.

90 | (blank) | [email protected]

I need to keep 1 instance of any of those three variations, but not multiple instances. Does that make sense (even though it probably doesn't make sense from a design perspective).

Thanks again for the assistance here.

OldWest · November 5, 2010

Pikachu2000, it was my bad on suggesting a new thread ... I just thought due to the nature it was better suited for the MySQL forum.. Apologies for the extra leg work.

Pikachu2000 · November 5, 2010

No need to apologize. It got taken care of . . .

Sign In

PHP script to remove duplicate sql entries

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information