subnet_rx Posted October 1, 2007 Share Posted October 1, 2007 I'm trying to remove duplicate records in a database. They will have a unique primary key, but there's another field in the database that should be unique for each person. So, if the same $row['item_id'] exists for the same $row['userid'] on two rows, then I need to delete one of them. What would be the best way of doing this? Quote Link to comment Share on other sites More sharing options...
jaymc Posted October 1, 2007 Share Posted October 1, 2007 I had to do this a while back May not be the best way but I queries the table, put all the entries that I wanted to be unique in an array It did this via a loop, it would use in_array() to see if it was already in there, if it was, it added that value to another array which was used to store duplicate records At the end, I had an array with all duplicate records, from there, delete all records using your array with dupe entries in Quote Link to comment Share on other sites More sharing options...
fenway Posted October 1, 2007 Share Posted October 1, 2007 Assuming you don't care about any of the "different" info the remaining columns, adding a unique index on that column will prune the dupes. Quote Link to comment Share on other sites More sharing options...
jaymc Posted October 1, 2007 Share Posted October 1, 2007 Assuming you don't care about any of the "different" info the remaining columns, adding a unique index on that column will prune the dupes. Are you sure? When I came accross this issue it was attempting to make a field unique It popped up with an error telling me there was duplicate entries, hence having to write a script to dig them out before executing the unique business I dont think it will prune as a default unless there is a more explicit approach to forcefull make a field unique regardless Quote Link to comment Share on other sites More sharing options...
fenway Posted October 1, 2007 Share Posted October 1, 2007 Yes, I'm sure doing this with alter table works. Quote Link to comment Share on other sites More sharing options...
jaymc Posted October 1, 2007 Share Posted October 1, 2007 Try it.. I bet you it doesnt Unless its a new feature with an updated version from 4.1 or a setting It would be stupid for mysql to have this feature incorporated..by default, it leaves the door open for unintentional data loss When trying your method above I definately had to write a php script to prune, mysql kicked up a fuss Quote Link to comment Share on other sites More sharing options...
fenway Posted October 1, 2007 Share Posted October 1, 2007 There's an excellent article on other ways to do this... if you want to be lazy, just make a new table, and use insert into... select where... with ignore after adding a unique key. Quote Link to comment Share on other sites More sharing options...
subnet_rx Posted October 3, 2007 Author Share Posted October 3, 2007 Thanks for the replies, I'm just getting around to doing this. I don't think that the unique field idea is going to work for me, since it's not really unique. For example, the two columns I'm talking about might look like this: item_id user_id 1000 1 1002 1 1003 1 1002 2 1002 2 So, I would want to delete one of the rows where the item_id and user_id matches on two different rows. Basically, my form checking wasn't good enough, and some people entered their form twice when this shouldn't be possible. Quote Link to comment Share on other sites More sharing options...
fenway Posted October 3, 2007 Share Posted October 3, 2007 You can have a unique index that spans multiple columns... Quote Link to comment Share on other sites More sharing options...
Illusion Posted October 3, 2007 Share Posted October 3, 2007 yes, adding IGNORE keyword will suppress all the errors and keep the records with first occurrence of each combination and deletes the rest of them. Quote Link to comment Share on other sites More sharing options...
subnet_rx Posted October 3, 2007 Author Share Posted October 3, 2007 You can have a unique index that spans multiple columns... I guess I need to read more on this. I don't want to just keep the first occurrence, but in my previous example, just delete the second 1002 2 row. I'd keep the other occurrences of 1002. Quote Link to comment Share on other sites More sharing options...
fenway Posted October 4, 2007 Share Posted October 4, 2007 Right... but you wouldn't have a second 1002/2 row if the unique index was present. Quote Link to comment Share on other sites More sharing options...
subnet_rx Posted October 22, 2007 Author Share Posted October 22, 2007 Ok, I'm working on this again today, and I think I have gotten what your saying. When I add a unique index to these columns it gives me errors because there are duplicate entries. How can I force this and remove the extras all at once? Quote Link to comment Share on other sites More sharing options...
fenway Posted October 22, 2007 Share Posted October 22, 2007 According to the refman: IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key, The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.