Update duplicate rows

weee · November 21, 2010

Hello,

I am trying to write a query that will update duplicate rows. I searched online to find how to find duplicates when I stumbled accross this:

select address, count(address) as cnt
from mailing_list
group by address
having cnt > 1

This code finds the duplicates but how can I update a column to make them unique? For example, I would like to do something like this:

UPDATE mailing_list set address=rand()+houseNum
from mailing_list
group by address
having cnt > 1

Could someone please point me on the right track to do something like this?

Thank you!

weee

fenway · November 21, 2010

You need to use the former query as a sub-query of the latter.

weee · November 21, 2010

I am getting pretty close to the expected output I would like now.

Here is my query:

SELECT DISTINCT t1.a, t1.b, t1.c, d
FROM table1 AS t1
JOIN (
SELECT b, c
FROM table1
GROUP BY b
HAVING count( b ) >1
) AS t2 ON t1.b = t2.b
LEFT JOIN table2 AS c ON c.c = t1.c

My output looks like:

a b c d
1 2 3 4

2 3 4 4

3 4 3 15

Now my question is, how can I count how many rows are duplicates in column d?

So I would get an output like

a b c d e
1 2 3 4 2

2 3 4 4 2

3 4 3 15 1

Thank you for helping me with this

weee

fenway · November 21, 2010

Why not simple include count(b) as an aliased expression?

weee · November 21, 2010

I'm not quite sure if I understand what you mean?

fenway · November 22, 2010

I'm not quite sure if I understand what you mean?

Include "count(b) as e" in your subquery.

weee · November 22, 2010

I need to count d, not b. The problem is t1.d needs the left join to table2 in order to get the correct value. So, if I place count(d) as e in the subquery, it doesn't know what the value is.

fenway · November 22, 2010

Oh, sorry, I didn't see that -- then just wrap this in yet another sub-query.

weee · November 22, 2010

I am really confused lol So this sub query would be the second sub query and my current sub query would be a third sub query? I have never had to write a query like this before :-\

Thank you for helping me with this, I really appreciate it

weee

fenway · November 23, 2010

That's correct.

weee · November 23, 2010

SELECT DISTINCT t3.a, t3.b, t3.c, ci.d, count( ci.d ) AS e
FROM table1 AS t1
JOIN (

SELECT DISTINCT t1.a, t1.b, t1.c, lower( c.d ) 
FROM table1 AS t1
JOIN (

SELECT b, c
FROM table1
GROUP BY b
HAVING count( b ) >1) AS t2 ON t1.b = t2.b
LEFT JOIN table2 AS c ON c.c= t1.c
GROUP BY b
) AS t3 ON t3.b = t1.b
LEFT JOIN table2 AS ci ON ci.c= t3.c
GROUP BY t3.b

Now I am pretty close. The only problem is it is returning the incorrect number for e. This is returning the entire count of d rather than how many duplicates. For example, there could be 16 instances of d, but there are only 2 duplicates based off of the results without my third join.

Thanks again for helping, I really appreciate this.

weee

fenway · November 25, 2010

Use "COUNT( DISTINCT cd.d ) AS e" instead.

weee · November 25, 2010

Cool! Now the e column is being counted properly, however the duplicate rows are being dropped now.

This query showed the correct info:

SELECT DISTINCT t1.a, t1.b, t1.c, lower(c.d) as d
FROM table1 AS t1
JOIN (
	SELECT b, c
	FROM table1
	GROUP BY b 
	HAVING count( b ) >1
) AS t2 ON t1.b = t2.b
LEFT JOIN table2 AS c ON c.c = t1.c

but this one now shows the correct count, but dropped the duplicate row that had a unique value in c.

SELECT t3.a, t3.b, t3.c, ci.d, COUNT( DISTINCT t3.d ) AS e
FROM table1 AS t0
JOIN (
	SELECT DISTINCT t1.a, t1.b, t1.c, lower(c.d) as d
	FROM table1 AS t1
		JOIN (
			SELECT b, c
			FROM table1
			GROUP BY b 
			HAVING count( b ) >1
		) AS t2 ON t1.b = t2.b
	LEFT JOIN table2 AS c ON c.c = t1.c
) AS t3 ON t3.b = t0.b
LEFT JOIN table2 AS ci ON ci.c = t0.c
GROUP BY t0.b

Thank you again for helping, I really really appreciate it!

weee

fenway · November 25, 2010

That's confusing -- the only thing that could have changed was the count -- was it correct before?

weee · November 25, 2010

Once I added my last join statement, that was when I lost my duplicate rows. It looks like I may have glaced over the expected results a little too quick, my bad However, as I mentioned, the first query in my last post has the correct results. It is just a matter of counting up those rows.

Thanks again for helping me on this!

weee

P.S. Happy Thanksgiving if you celebrate it

fenway · November 25, 2010

Not sure how you can lose rows with a left join...

weee · November 25, 2010

Based off of this query:

SELECT DISTINCT t1.a, t1.b, t1.c, lower(c.d) as d
FROM table1 AS t1
   JOIN (
      SELECT b, c
      FROM table1
      GROUP BY b
      HAVING count( b ) >1
   ) AS t2 ON t1.b = t2.b
LEFT JOIN table2 AS c ON c.c = t1.c

What would you do to count d's as e where b's would be duplicates? Like I said in my last post, this query gets the output I would like, just cannot get the count of d's where b's are duplicates.

Sign In

Update duplicate rows

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information