Jump to content

Removing duplicates in a database?


subnet_rx

Recommended Posts

I'm trying to remove duplicate records in a database.  They will have a unique primary key, but there's another field in the database that should be unique for each person.  So, if the same $row['item_id'] exists for the same $row['userid'] on two rows, then I need to delete one of them.  What would be the best way of doing this? 

Link to comment
Share on other sites

I had to do this a while back

 

May not be the best way but I queries the table, put all the entries that I wanted to be unique in an array

 

It did this via a loop, it would use in_array() to see if it was already in there, if it was, it added that value to another array which was used to store duplicate records

 

At the end, I had an array with all duplicate records, from there, delete all records using your array with dupe entries in

Link to comment
Share on other sites

Assuming you don't care about any of the "different" info the remaining columns, adding a unique index on that column will prune the dupes.

Are you sure?

 

When I came accross this issue it was attempting to make a field unique

 

It popped up with an error telling me there was duplicate entries, hence having to write a script to dig them out before executing the unique business

 

I dont think it will prune as a default unless there is a more explicit approach to forcefull make a field unique regardless

Link to comment
Share on other sites

Try it.. I bet you it doesnt ;)

 

Unless its a new feature with an updated version from 4.1 or a setting

 

It would be stupid for mysql to have this feature incorporated..by default, it leaves the door open for unintentional data loss

 

When trying your method above I definately had to write a php script to prune, mysql kicked up a fuss

Link to comment
Share on other sites

Thanks for the replies, I'm just getting around to doing this.  I don't think that the unique field idea is going to work for me, since it's not really unique.  For example, the two columns I'm talking about might look like this:

 

item_id  user_id

1000      1

1002      1

1003      1

1002      2

1002      2

 

 

So, I would want to delete one of the rows where the item_id and user_id matches on two different rows.  Basically, my form checking wasn't good enough, and some people entered their form twice when this shouldn't be possible. 

Link to comment
Share on other sites

  • 3 weeks later...

According to the refman:

IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key, The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.