Removing duplicates in a database?

subnet_rx · October 1, 2007

I'm trying to remove duplicate records in a database. They will have a unique primary key, but there's another field in the database that should be unique for each person. So, if the same $row['item_id'] exists for the same $row['userid'] on two rows, then I need to delete one of them. What would be the best way of doing this?

jaymc · October 1, 2007

I had to do this a while back

May not be the best way but I queries the table, put all the entries that I wanted to be unique in an array

It did this via a loop, it would use in_array() to see if it was already in there, if it was, it added that value to another array which was used to store duplicate records

At the end, I had an array with all duplicate records, from there, delete all records using your array with dupe entries in

fenway · October 1, 2007

Assuming you don't care about any of the "different" info the remaining columns, adding a unique index on that column will prune the dupes.

jaymc · October 1, 2007

Assuming you don't care about any of the "different" info the remaining columns, adding a unique index on that column will prune the dupes.

Are you sure?

When I came accross this issue it was attempting to make a field unique

It popped up with an error telling me there was duplicate entries, hence having to write a script to dig them out before executing the unique business

I dont think it will prune as a default unless there is a more explicit approach to forcefull make a field unique regardless

fenway · October 1, 2007

Yes, I'm sure doing this with alter table works.

jaymc · October 1, 2007

Try it.. I bet you it doesnt

Unless its a new feature with an updated version from 4.1 or a setting

It would be stupid for mysql to have this feature incorporated..by default, it leaves the door open for unintentional data loss

When trying your method above I definately had to write a php script to prune, mysql kicked up a fuss

fenway · October 1, 2007

There's an excellent article on other ways to do this... if you want to be lazy, just make a new table, and use insert into... select where... with ignore after adding a unique key.

subnet_rx · October 3, 2007

Thanks for the replies, I'm just getting around to doing this. I don't think that the unique field idea is going to work for me, since it's not really unique. For example, the two columns I'm talking about might look like this:

item_id user_id

1000 1

1002 1

1003 1

1002 2

So, I would want to delete one of the rows where the item_id and user_id matches on two different rows. Basically, my form checking wasn't good enough, and some people entered their form twice when this shouldn't be possible.

fenway · October 3, 2007

You can have a unique index that spans multiple columns...

Illusion · October 3, 2007

yes, adding IGNORE keyword will suppress all the errors and keep the records with first occurrence of each combination and deletes the rest of them.

subnet_rx · October 3, 2007

You can have a unique index that spans multiple columns...

I guess I need to read more on this. I don't want to just keep the first occurrence, but in my previous example, just delete the second 1002 2 row. I'd keep the other occurrences of 1002.

fenway · October 4, 2007

Right... but you wouldn't have a second 1002/2 row if the unique index was present.

subnet_rx · October 22, 2007

Ok, I'm working on this again today, and I think I have gotten what your saying. When I add a unique index to these columns it gives me errors because there are duplicate entries. How can I force this and remove the extras all at once?

fenway · October 22, 2007

According to the refman:

IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key, The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

Sign In

Removing duplicates in a database?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information