Jump to content

Recommended Posts

I have a database that collects visitor's info and am trying to figure out how I can execute a php script that will search and remove any duplicate entries.

 

I have table 1 that has a list of unique usernames. Table 2 adds usernames that may or may not be in table 1. I need to keep all usernames in table 1, but delete any that also appear in table 2.

 

There are also some instances where the username is added twice to table 2, but not included in table 1. In that case, I need to remove the duplicate entries.

 

I'm really not sure where to begin with this and hoping someone will be willing to to help me out (newbie alert!) Thanks!

I'm realizing I probably wasn't clear enough earlier so I'll try again.

 

And Table 1...

Index | username

 

Table 2 looks like this...

id | username | email

 

So I don't want to remove anything from table 1, but

  • if the username is in 1 and in 2, remove username from table 2.
  • if multiple instances of username is in table 2 remove the duplicates and leave one entry
  • if multiple instances of email is in table 2, remove the duplicates but leave one entry

 

I was thinking a php script so I can set a cron to run it periodically and clean up the duplicates. Let me know if I left out any other details.

 

Thanks!

There is a tab in phpMyAdmin titled "SQL". Click on that. There will be a query in the textbox. Delete that and copy in the code that "The Little Guy" posted AT YOUR OWN RISK! ... But that's the idea.. Make sure you have a complete backup of your database and all table data etc before running any SQL on your stuff.. You can make a complete backup under tab "EXPORT" - click the checkbox "Save file as" and then run the export by clicking "Go".. SERIOUSLY USE THIS AT YOUR OWN RISK. YOU COULD LOSE ALL OF YOUR DATA IF THIS IS NOT DONE CORRECTLY! YOU'VE BEEN WARNED!

You will need to modify the code The Little Guy posted as well based on your table names etc.. And probably need to tweak that around some as well to do exactly what you need.

 

YOU SHOULD SERIOUSLY TAKE THIS POST TO THE SQL FORUM TO VERIFY THE EXACT SQL YOU SHOULD USE TO RUN ON YOUR TABLES.

I posted a similar thread in the php forum thinking I needed a php script, but was recommended to ask this question over here.

 

I have a SQL database that collects visitor's info and am trying to figure out how to search and remove any duplicate entries. It's a little tricky because I some duplicates are may be on two seperate tables. Here's what's going on...

 

Table 1 Looks Like This...

Index | username

 

Table 2 looks like this...

id | username | email

 

I don't want to remove anything from table 1 ever, but

 

    * if the username is in 1 and in 2, remove username from table 2.

    * if multiple instances of username is in table 2 remove the duplicates and leave one entry (if its not in table 1)

    * if multiple instances of email is in table 2, remove the duplicates but leave one entry

 

The Little Guy suggested this code to tweak, but can use some assistance with that.

 

delete from table1
USING table1, table1 as vtable
WHERE (NOT table1.ID=vtable.ID)
AND (table1.field_name=vtable.field_name)

 

 

I was originaly thinking I needed a php script so I can set a cron to run a process periodically to clean up the duplicates, but I think I first need to figure out how to do this from within phpmyadmin. Let me know if I left out any other details.

 

I'm kinda at a loss here, and any help is really appreciated.

 

Thanks!

well, the first query should be pretty simple. you can just delete from the second table where any username is in the entire first table:

 

DELETE FROM table2 WHERE username IN (SELECT DISTINCT username FROM table1)

 

here is a link containing some code that will help you figure out how to remove similar records (that is, records containing a duplicate value in one column):

 

http://www.cryer.co.uk/brian/sql/sql_delete_duplicates.htm#DeleteSimilarRecords

 

the code he gives is to delete all similar records, while retaining the one with the lowest `uniqueField`. that's something you can change as per your requirements, depending on which record you want to keep in case of duplicate values.

 

you can run the first query, followed by the second and third in whichever order you choose to. hope this helps.

Hey akitchin,

 

Thanks for the help and for pointing me to the site explaining how to remove similar values. That was helpful, but need just a little bit more help...

 

I did exactly as you have here for the first query, making sure the tables and labels were accurate, and it didn't seem to work. I just received message: "0 row(s) deleted" even though duplicates between the two tables do exist. Any thoughts on what may have gone wrong here?

DELETE FROM table2 WHERE username IN (SELECT DISTINCT username FROM table1)

 

As for the Little Gut code, his code worked except I'm realizing now it has to be a bit more specific. The table looks like this:

id | username | email

 

In some instances, the 'username' field is blank. So the code given deletes all of those lines. The other issue is that it deletes the entire row. But, if it has a value for 'email' I want to keep that email information there (so just remove 'username' from the row). If there is no value for 'email' then it can delete the row.

 

Here's the code again from The Little Guy that is deleting the row entirely, including blank fields:

delete from table1
USING table1, table1 as vtable
WHERE (NOT table1.ID=vtable.ID)
AND (table1.username=vtable.username)

 

Thanks again!

regarding the first query (to delete duplicates from table2 that exist in table1): are you certain that the "username" columns are identical? is there a spacing difference (e.g. does one consistently have a leading or trailing space)?

 

as for the second issue of duplicates WITHIN table2, i think some example data would help here. are you saying you want to keep any rows with duplicate usernames but non-empty email fields, while setting the duplicate username values to NULL and vice versa? if so, you may want to reconsider how you've designed your table...

 

EDIT: thanks pikachu, i (wrongly) assumed the other thread was deleted.

Thanks for merging Pikachu2000. Didn't meant to try to scream for attention with 2 threads. It was just suggested to me to start a new thread in the other forum.

 

akitchin, I double checked both tables and the labels are identical. Alll lowercase, no extra spaces. The only difference is that table 1 has columns and table 2 has 3 columns.

 

For the second issue, yeah... I agree it's probably a design flaw from the programmer. Here's some sample data. Here's some example data:

 

Table 1 (don't want to delete anything here. All unique values that were imported):

index | username

1 | dave

2 | john

3 | steve

4 | jim

5 | rob

 

Table 2 can look like this...

index | username | email

1 | dave | (blank)

.

.

.

.

35 | dave | dave@email.com

.

.

.

90 | (blank) | dave@email.com

 

 

I need to keep 1 instance of any of those three variations, but not multiple instances. Does that make sense (even though it probably doesn't make sense from a design perspective).

 

Thanks again for the assistance here.

 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.