Jump to content

[SOLVED] Trying to Delete dupe rows but keep 1


revraz

Recommended Posts

Kinda stuck on how to approach this.

 

Table Layout:

 

id    cid    start        duration

1    123    123123    1000

2    123    123123    800

3    123    123123    500

4    123    125125    1000

5    234    333444    500

6    234    333444    800

 

So what I need to do is build a query that will delete duplicate rows.  Duplicate rows is defined as having the same cid and start values, and I need to keep the row with the highest duration.  So in the above example, I need to delete ID 2, 3, 5.

 

Can I do this with just SQL or do I need to use code?

Link to comment
Share on other sites

Does this look like it would be it (minus changing the select to delete).  I am not 100% sure on what to use as the Group By

 

select bad_rows.*

from log as bad_rows

  inner join (

      select cid, start, MAX(duration) as max_id

      from log

      group by start

      having count(*) > 1

  ) as good_rows on good_rows.start = bad_rows.start

      and good_rows.max_id <> bad_rows.duration;

Link to comment
Share on other sites

Actually I think I need to add a cid compare as well.

 

select bad_rows.*

from log as bad_rows

  inner join (

      select cid, start, MAX(duration) as max_id

      from log

      group by start

      having count(*) > 1

  ) as good_rows on good_rows.start = bad_rows.start

      and good_rows.cid = bad_rows.cid

      and good_rows.max_id <> bad_rows.duration;

Link to comment
Share on other sites

The following worked for me on your example data -

 

$query = "
select bad_rows.*
from log as bad_rows
   inner join (
      select cid, start, MAX(duration) as max_dur
      from log
      group by cid,start
      having count(*) > 1
   ) as good_rows on good_rows.cid = bad_rows.cid AND good_rows.start = bad_rows.start
      and good_rows.max_dur <> bad_rows.duration";

 

If you have the same duration values (cid,start, and duration all match) the query won't find them and delete them due to the <> comparison in the above. You would need to use a second query that just forms groups and deletes one of the rows when count(*) > 1

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.