Codexer Posted October 3, 2012 Share Posted October 3, 2012 Hey guys, First of all, wow! How did I not know about this forum? Unbelievable amount of knowledge here. Now I have a question regarding MySQL. I am trying to sort duplicate rows based on two fields from a table. This table has several million rows. Currently, I have the following: SELECT p1.id AS idone, p1.duration AS durationone, p1.title AS titleone, p1.status, p2.id AS idtwo, p2.duration AS durationtwo, p2.title AS titletwo, p2.status FROM table_one AS p1, table_one AS p2 WHERE p1.title = p2.title AND p1.duration = p2.duration AND p1.status != 0 AND p2.status != 0 AND p1.id != p2.id LIMIT 20 Output: idone | durationone | titleone | idtwo | durationtwo | titletwo Which will return rows which have the same title AND duration where the ID is different plus the status is not 0... The problem is, that it's quite slow. Especially when I want to return 50+ rows it reaches over 60 seconds. Ideally, I would like the query to be fast enough to return about 100 duplicate rows at a time for further action. Is there a better way of doing this? I believe it's the "AND"s that are slowing it down. As if you trim them back down, the query gets a lot faster. However, it also defeats the purpose. I would be fine with creating a temporary index for this task. I do realize that methods can be put into place to prevent this in the first place however it needs some human intervention. If you guys have any ideas on a better/faster way to retrieve this information, I would be highly appreciative! Quote Link to comment https://forums.phpfreaks.com/topic/269027-better-way-to-run-this-query-checking-for-dupes/ Share on other sites More sharing options...
Barand Posted October 3, 2012 Share Posted October 3, 2012 do you have indexes on title, duration, status? Quote Link to comment https://forums.phpfreaks.com/topic/269027-better-way-to-run-this-query-checking-for-dupes/#findComment-1382404 Share on other sites More sharing options...
Codexer Posted October 3, 2012 Author Share Posted October 3, 2012 I've got the primary on ID, however there's no index on title, duration or status. Should I create them as "one" index? Or individual? I will only leave the indices there whilst I perform this task, as the table is heavy on inserts. Cheers! Quote Link to comment https://forums.phpfreaks.com/topic/269027-better-way-to-run-this-query-checking-for-dupes/#findComment-1382408 Share on other sites More sharing options...
gizmola Posted October 3, 2012 Share Posted October 3, 2012 One index on title, duration, status, id. Quote Link to comment https://forums.phpfreaks.com/topic/269027-better-way-to-run-this-query-checking-for-dupes/#findComment-1382423 Share on other sites More sharing options...
fenway Posted October 3, 2012 Share Posted October 3, 2012 And show EXPLAIN before and after. Quote Link to comment https://forums.phpfreaks.com/topic/269027-better-way-to-run-this-query-checking-for-dupes/#findComment-1382430 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.