Jump to content

Help in displaying duplicates in database


Recommended Posts

I have a name & address database with 20,000 names that I would like to weed the duplicates from.  I have an admin menu that has a few ways to sort this database and would like to add a dupe check to this admin menu.  That is, I can sort the entries by last name, or by email address, etc.  What I  would like is to produce a list of dupes.  I will need to show dupes (only) by last name and then by first name.  I can also do this by showing only the dupes sorted on email address.  This is a bit over my head, so I need help with the query.  I am using PHP ver 5 and MySQL ver 5.

There's no point in displaying all the dups because they would all have the same name. I assume you want to display the primary key of the table to help locate those specific duplicates.

 

Here's an example of a query where it finds duplicate names. It assumes that the primary key of the table is called "id". It displays all the id's of the duplicate records for a particular name.

 

SELECT  GROUP_CONCAT(id) AS primary_keys, last_name, first_name, COUNT(CONCAT(last_name, first_name)) AS nbr_dups

FROM table_name

GROUP BY last_name, first_name

HAVING nbr_dups > 1

ORDER BY last_name, first_name

;

 

And you would do something similar for the email address:

 

SELECT  GROUP_CONCAT(id) AS primary_keys, email, COUNT(email) AS nbr_dups

FROM table_name

GROUP BY email

HAVING nbr_dups > 1

ORDER BY email

;

 

 

I would like all the columns to be displayed because there is information in those other columns that I would like to consolidate manually. 

I am not sure what "primary_keys" is in your script.  I used this variation of your script:

 

SELECT GROUP_CONCAT(recordid) AS lastname, firstname, COUNT(CONCAT(lastname, firstname)) AS nbr_dups

FROM tbl_data

GROUP BY lastname, firstname

HAVING nbr_dups > 1

ORDER BY lastname, firstname

 

I got a 2 column output that for each row, I got a first name and then sets of numbers, which I assume is the record number.  What changes do I need to make so all the columns are displayed for the dupes listed?

 

List the specific columns you want then, or use * to select all columns. Example:

 

SELECT td.*, GROUP_CONCAT(recordid) AS keys, lastname, firstname, COUNT(CONCAT(lastname, firstname)) AS nbr_dups

FROM tbl_data td

...

 

 

Let the manual be your friend:

 

http://dev.mysql.com/doc/refman/5.0/en/select.html

 

 

That is almost exactly what I want.  The latest shows the dupes, but shows each (lastname/firstname) dupe on only one row, with the lastname column filled with the recordid number(s).  If each dupe could be shown on its own row with the last name displayed, that would be absolutely perfect.

Please disregard my last post (as I cannot edit it).  I  have done some work and am almost there.  Here is what I have now:

 

SELECT *, COUNT(*)

FROM tbl_data

GROUP BY lastname, firstname

HAVING count(*) > 1

 

There is a slight problem.  This shows duplicate rows where the lastname and firstname are the same.  What I want is to display ALL dupes with the same first and last name.  For example, if my database contains 3 rows, all with the name of John Doe, the above query will only show one row that has John Doe.  I want all three rows to be displayed.  Any idea what needs to change?

No problem (untested):

 

SELECT t1.* 
FROM tbl_data AS t1
INNER JOIN (
SELECT t2.lastname, t2.firstname, COUNT(*) AS cnt
FROM tbl_data AS t2
GROUP BY t2.lastname, t2.firstname
HAVING cnt > 1 ) AS t2 ON ( t2.lastname = t1.lastname AND t2.firstname = t1.firstname)

 

--fenway: edited a typo

SELECT t1.*

FROM tbl_data AS t1

INNER JOIN (

SELECT t2.lastname, t2.firstname, COUNT(*) AS cnt

FROM tbl_data AS t2

GROUP BY t2.lastname, t2.firstname

HAVING cnt > 1 ) AS t2 ON ( t2.lastname = t1.lastname AND t2.firstname = t2.firstname)

 

Fantastic!  This worked after I made one very small change - change a "t2" to "t1". 

The last line should look like this:

HAVING cnt > 1 ) AS t2 ON ( t2.lastname = t1.lastname AND t2.firstname = t1.firstname)

Thanks so much.  This has really been a big help!

 

 

Would it be possible to split these results into 5 or 6 output lists?

 

Apparently the database and dupe list is getting pretty large and it takes about 2 minutes to display the results.  BTW - I have 20,000 names and now I have an accurate list of dupes at about 7,000.

 

I think I would like to try having the display show only the results of lastname that begin with a certain letter.  That is, display the results of all duplicates (from the query above) that have a last name that begins with the letter A.  What changes to the recordset would be needed?

Would it be possible to split these results into 5 or 6 output lists?

 

Apparently the database and dupe list is getting pretty large and it takes about 2 minutes to display the results.  BTW - I have 20,000 names and now I have an accurate list of dupes at about 7,000.

 

2 mins to display (php) or to query (mysql)?

 

I think I would like to try having the display show only the results of lastname that begin with a certain letter.  That is, display the results of all duplicates (from the query above) that have a last name that begins with the letter A.  What changes to the recordset would be needed?

You'd need to add a where clause... WHERE firstname LIKE 'B%'

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.