mysty Posted December 3, 2007 Share Posted December 3, 2007 I have a name & address database with 20,000 names that I would like to weed the duplicates from. I have an admin menu that has a few ways to sort this database and would like to add a dupe check to this admin menu. That is, I can sort the entries by last name, or by email address, etc. What I would like is to produce a list of dupes. I will need to show dupes (only) by last name and then by first name. I can also do this by showing only the dupes sorted on email address. This is a bit over my head, so I need help with the query. I am using PHP ver 5 and MySQL ver 5. Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/ Share on other sites More sharing options...
mysty Posted December 3, 2007 Author Share Posted December 3, 2007 I forgot to mention that I would like to have the output show all the dupes. That is, if the database contains 3 instances of John Doe, that the output would show all 3 instances of John Doe's entries. Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-404719 Share on other sites More sharing options...
toplay Posted December 3, 2007 Share Posted December 3, 2007 There's no point in displaying all the dups because they would all have the same name. I assume you want to display the primary key of the table to help locate those specific duplicates. Here's an example of a query where it finds duplicate names. It assumes that the primary key of the table is called "id". It displays all the id's of the duplicate records for a particular name. SELECT GROUP_CONCAT(id) AS primary_keys, last_name, first_name, COUNT(CONCAT(last_name, first_name)) AS nbr_dups FROM table_name GROUP BY last_name, first_name HAVING nbr_dups > 1 ORDER BY last_name, first_name ; And you would do something similar for the email address: SELECT GROUP_CONCAT(id) AS primary_keys, email, COUNT(email) AS nbr_dups FROM table_name GROUP BY email HAVING nbr_dups > 1 ORDER BY email ; Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-404781 Share on other sites More sharing options...
mysty Posted December 3, 2007 Author Share Posted December 3, 2007 I would like all the columns to be displayed because there is information in those other columns that I would like to consolidate manually. I am not sure what "primary_keys" is in your script. I used this variation of your script: SELECT GROUP_CONCAT(recordid) AS lastname, firstname, COUNT(CONCAT(lastname, firstname)) AS nbr_dups FROM tbl_data GROUP BY lastname, firstname HAVING nbr_dups > 1 ORDER BY lastname, firstname I got a 2 column output that for each row, I got a first name and then sets of numbers, which I assume is the record number. What changes do I need to make so all the columns are displayed for the dupes listed? Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-404803 Share on other sites More sharing options...
toplay Posted December 3, 2007 Share Posted December 3, 2007 List the specific columns you want then, or use * to select all columns. Example: SELECT td.*, GROUP_CONCAT(recordid) AS keys, lastname, firstname, COUNT(CONCAT(lastname, firstname)) AS nbr_dups FROM tbl_data td ... Let the manual be your friend: http://dev.mysql.com/doc/refman/5.0/en/select.html Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-404804 Share on other sites More sharing options...
mysty Posted December 3, 2007 Author Share Posted December 3, 2007 That is almost exactly what I want. The latest shows the dupes, but shows each (lastname/firstname) dupe on only one row, with the lastname column filled with the recordid number(s). If each dupe could be shown on its own row with the last name displayed, that would be absolutely perfect. Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-404831 Share on other sites More sharing options...
mysty Posted December 4, 2007 Author Share Posted December 4, 2007 I read the link for the attached, but have tried a few things, all with the same results: Error messages. Can someone help with the next step? Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-405741 Share on other sites More sharing options...
fenway Posted December 4, 2007 Share Posted December 4, 2007 You can't have everything "on its own row" and use GROUP_CONCAT(). Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-406113 Share on other sites More sharing options...
mysty Posted December 5, 2007 Author Share Posted December 5, 2007 Is this even possible? If so, how? Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-406606 Share on other sites More sharing options...
fenway Posted December 5, 2007 Share Posted December 5, 2007 Well, if you take Barand's example, and drop the group concat, you'll get something quite close... you can use this to join back to the original table. Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-406913 Share on other sites More sharing options...
mysty Posted December 5, 2007 Author Share Posted December 5, 2007 Do you mean like this: SELECT lastname, firstname COUNT lastname, firstname AS nbr_dups FROM tbl_data GROUP BY lastname, firstname HAVING nbr_dups > 1 ORDER BY lastname, firstname Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-407369 Share on other sites More sharing options...
mysty Posted December 6, 2007 Author Share Posted December 6, 2007 Please disregard my last post (as I cannot edit it). I have done some work and am almost there. Here is what I have now: SELECT *, COUNT(*) FROM tbl_data GROUP BY lastname, firstname HAVING count(*) > 1 There is a slight problem. This shows duplicate rows where the lastname and firstname are the same. What I want is to display ALL dupes with the same first and last name. For example, if my database contains 3 rows, all with the name of John Doe, the above query will only show one row that has John Doe. I want all three rows to be displayed. Any idea what needs to change? Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-407410 Share on other sites More sharing options...
fenway Posted December 6, 2007 Share Posted December 6, 2007 What you need to do is select just last_name, first_name from that query, then join it back to the tbl_data with these two columns as the join condition. Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-407598 Share on other sites More sharing options...
mysty Posted December 9, 2007 Author Share Posted December 9, 2007 I am almost embarassed to admit that I have tried this several different ways, but to no avail. Can you be specific as to the query? Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-410580 Share on other sites More sharing options...
fenway Posted December 10, 2007 Share Posted December 10, 2007 No problem (untested): SELECT t1.* FROM tbl_data AS t1 INNER JOIN ( SELECT t2.lastname, t2.firstname, COUNT(*) AS cnt FROM tbl_data AS t2 GROUP BY t2.lastname, t2.firstname HAVING cnt > 1 ) AS t2 ON ( t2.lastname = t1.lastname AND t2.firstname = t1.firstname) --fenway: edited a typo Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-410617 Share on other sites More sharing options...
mysty Posted December 11, 2007 Author Share Posted December 11, 2007 SELECT t1.* FROM tbl_data AS t1 INNER JOIN ( SELECT t2.lastname, t2.firstname, COUNT(*) AS cnt FROM tbl_data AS t2 GROUP BY t2.lastname, t2.firstname HAVING cnt > 1 ) AS t2 ON ( t2.lastname = t1.lastname AND t2.firstname = t2.firstname) Fantastic! This worked after I made one very small change - change a "t2" to "t1". The last line should look like this: HAVING cnt > 1 ) AS t2 ON ( t2.lastname = t1.lastname AND t2.firstname = t1.firstname) Thanks so much. This has really been a big help! Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-411490 Share on other sites More sharing options...
mysty Posted December 11, 2007 Author Share Posted December 11, 2007 Would it be possible to split these results into 5 or 6 output lists? Apparently the database and dupe list is getting pretty large and it takes about 2 minutes to display the results. BTW - I have 20,000 names and now I have an accurate list of dupes at about 7,000. Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-412364 Share on other sites More sharing options...
mysty Posted December 12, 2007 Author Share Posted December 12, 2007 I think I would like to try having the display show only the results of lastname that begin with a certain letter. That is, display the results of all duplicates (from the query above) that have a last name that begins with the letter A. What changes to the recordset would be needed? Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-412762 Share on other sites More sharing options...
fenway Posted December 12, 2007 Share Posted December 12, 2007 Would it be possible to split these results into 5 or 6 output lists? Apparently the database and dupe list is getting pretty large and it takes about 2 minutes to display the results. BTW - I have 20,000 names and now I have an accurate list of dupes at about 7,000. 2 mins to display (php) or to query (mysql)? I think I would like to try having the display show only the results of lastname that begin with a certain letter. That is, display the results of all duplicates (from the query above) that have a last name that begins with the letter A. What changes to the recordset would be needed? You'd need to add a where clause... WHERE firstname LIKE 'B%' Quote Link to comment https://forums.phpfreaks.com/topic/79904-help-in-displaying-duplicates-in-database/#findComment-412964 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.