Jump to content

[SOLVED] Finding duplicates in my DB


Recommended Posts

SELECT lastname, firstname, title, email, recordid

FROM tbl_data

GROUP BY lastname, firstname, email, lastname

HAVING (COUNT(*) > 1)

 

 

Here is the query (recordset in Dreamweaver) that I am using.  It lists all my dupes.  I would like to be able to list all the dupes, except list both dupes.  That is, instead of listing three dupes - smith, jones, brown, my list would show both smiths, both jones, both browns.

What changes are needed in my query?

I am using Dreamweaver 802, PHP, MySQL.

Link to comment
https://forums.phpfreaks.com/topic/40057-solved-finding-duplicates-in-my-db/
Share on other sites

Is what you really want the recordid of each row which is a duplicate according to the other 4 columns?  Try this:

 

SELECT recordid FROM
tbl_data NATURAL JOIN
( SELECT lastname, firstname, title, email
FROM tbl_data
GROUP BY lastname, firstname, email, lastname
HAVING (COUNT(*) > 1)
)

 

If your mysql doesn't support subqueries, then it'll need to be changed.. you didn't say which version of mysql you are using.

Hmm.. perhaps mysql can't join with a subquery.  That's a problem...

 

How about this..

 

CREATE TEMPORARY TABLE dups
AS SELECT lastname, firstname, title, email
FROM tbl_data
GROUP BY lastname, firstname, email, lastname
HAVING (COUNT(*) > 1);

SELECT recordid FROM tbl_data NATURAL JOIN dups;

 

That's the same thing, rewritten to use a temporary table.

Hmm.. perhaps mysql can't join with a subquery.  That's a problem...

 

No, you need to give your subquery an alias though.

You'll want to watch about joining on first names. Half the time you'll find the same person in there as Edward, Eddy, Ed.

SELECT td.*
FROM tbl_data AS td
JOIN (
   SELECT lastname, email,  MID(firstname,1,1) AS firstinit
   FROM tbl_data
   GROUP BY lastname, email, init
   HAVING (COUNT(*) > 1)
) AS sub ON td.lastname = sub.lastname AND td.email = sub.email 
AND  MID(td.firstname,1,1) = sub.firstinit

Here is the query copied from my recordset.

SELECT td.* FROM tbl_data AS td JOIN (    SELECT lastname, email,  MID(firstname,1,1) AS firstinit
FROM tbl_data
GROUP BY lastname, email, init
HAVING (COUNT(*) > 1) ) AS sub ON td.lastname = sub.lastname AND td.email = sub.email  AND  MID(td.firstname,1,1) = sub.firstinit

 

Here is the error message when the page is run:

You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT lastname, email, MID(firstname,1,1) AS firstinit FROM t

 

Let me know if there is any other info you need to troubleshoot this.

 

Logically the only thing that should be used to determine duplicates is the email address as different users can have the same firstname and lastname. Knowing that you can do the following.

 

<?php
$query = 'SELECT email, COUNT(*) AS cnt FROM tablename GROUP BY email HAVING cnt > 1 ';
$result = mysql_query($query) or die($query."<br />\n".mysql_error());
$dups = array();
while ($row = mysql_fetch_assoc($result))
{
    $dups[] = $row['email'];
}
$query = 'SELECT * FROM tablename WHERE email IN ("'.implode('","', $dups).'") ';
$result = mysql_query($query) or die($query."<br />\n".mysql_error());
?>

 

To determine duplicates using the original criteria you would GROUP BY fname,lname etc in the first query and then change the second query to compare all of the GROUP BY fields. When using this method one thing to note is, If the GROUP BY fields are the only fields in the table there should be no reason to list all of them as all of them would be the same.

 

I don't understand.  phpinfo shows the MySQL to be 5.0.18, and yet when I run the SQL query in phpmyadmin I get 4.0.27.

Could this be part of my problem.

 

SUBSELECTs aren't available in  MYSQL 4.0.

 

Is this normal or is 1and1 hosting a bit behind?

 

Your host more than likely has offerings that include access to more current versions of MYSQL but you can check the offerings of other hosts to see if they're behind by using 4.0 at all. Keep in mind that as far as I know MYSQL 4.0 is no longer being maintained.

 

 

I called my hosting company and they said they will be upgrading eventually.  Didn't sound like it was going to be too soon.  They said that I could upgrade to a dedicated server which means my monthly fee will multiply by about 10 to 15 times the current rate.  Guess I won't be upgrading - I may be back to this group with my question sometime in the future.  Hope it's in my lifetime.

I must have missed that post.  I guess I was online when it was entered.

Where does the code snippet go? 

 

You can put it in the section where you plan on outputting the results. If you rely on Dreamweaver to generate PHP code, then you may want read some of the tutorials on this and other sites to get a better understanding of what's happening in the snippet.

I put it in the page that outputs the results.  The code snippet went after the require once connection snippet and before the DOC statement.  Here is the error message.

 

SELECT email, COUNT(*) AS cnt FROM tbl_data GROUP BY email HAVING cnt > 1 
No Database Selected

I went back and used a clean page.  That is, the webpage I was using had several different thing I had tried, so probably had some remnants of who knows what.  When I tried the clean page, I got the dupe listing like before - a list with all the dupes, one at a time.  I moved the new code snippet around and got the same thing.  Would it help if I posted the section of code?

I went back and used a clean page.  That is, the webpage I was using had several different thing I had tried, so probably had some remnants of who knows what.  When I tried the clean page, I got the dupe listing like before - a list with all the dupes, one at a time.  I moved the new code snippet around and got the same thing.  Would it help if I posted the section of code?

 

You can post it.

Here is the code.  It is everything from line 1 to the DOC declaration, which I left in the first line.

<?php require_once('../Connections/con_cpg.php'); ?>
<?php
if (!isset($_SESSION)) {
  session_start();
}
$MM_authorizedUsers = "9";
$MM_donotCheckaccess = "false";

// *** Restrict Access To Page: Grant or deny access to this page
function isAuthorized($strUsers, $strGroups, $UserName, $UserGroup) { 
  // For security, start by assuming the visitor is NOT authorized. 
  $isValid = False; 

  // When a visitor has logged into this site, the Session variable MM_Username set equal to their username. 
  // Therefore, we know that a user is NOT logged in if that Session variable is blank. 
  if (!empty($UserName)) { 
    // Besides being logged in, you may restrict access to only certain users based on an ID established when they login. 
    // Parse the strings into arrays. 
    $arrUsers = Explode(",", $strUsers); 
    $arrGroups = Explode(",", $strGroups); 
    if (in_array($UserName, $arrUsers)) { 
      $isValid = true; 
    } 
    // Or, you may restrict access to only certain users based on their username. 
    if (in_array($UserGroup, $arrGroups)) { 
      $isValid = true; 
    } 
    if (($strUsers == "") && false) { 
      $isValid = true; 
    } 
  } 
  return $isValid; 
}

$MM_restrictGoTo = "../admin/admin_login.php";
if (!((isset($_SESSION['MM_Username'])) && (isAuthorized("",$MM_authorizedUsers, $_SESSION['MM_Username'], $_SESSION['MM_UserGroup'])))) {   
  $MM_qsChar = "?";
  $MM_referrer = $_SERVER['PHP_SELF'];
  if (strpos($MM_restrictGoTo, "?")) $MM_qsChar = "&";
  if (isset($QUERY_STRING) && strlen($QUERY_STRING) > 0) 
  $MM_referrer .= "?" . $QUERY_STRING;
  $MM_restrictGoTo = $MM_restrictGoTo. $MM_qsChar . "accesscheck=" . urlencode($MM_referrer);
  header("Location: ". $MM_restrictGoTo); 
  exit;
}
?>
<?php
if (!function_exists("GetSQLValueString")) {
function GetSQLValueString($theValue, $theType, $theDefinedValue = "", $theNotDefinedValue = "") 
{
  $theValue = get_magic_quotes_gpc() ? stripslashes($theValue) : $theValue;

  $theValue = function_exists("mysql_real_escape_string") ? mysql_real_escape_string($theValue) : mysql_escape_string($theValue);

  switch ($theType) {
    case "text":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;    
    case "long":
    case "int":
      $theValue = ($theValue != "") ? intval($theValue) : "NULL";
      break;
    case "double":
      $theValue = ($theValue != "") ? "'" . doubleval($theValue) . "'" : "NULL";
      break;
    case "date":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;
    case "defined":
      $theValue = ($theValue != "") ? $theDefinedValue : $theNotDefinedValue;
      break;
  }
  return $theValue;
}
}

$colname_data_list = "-1";
if (isset($_GET['recordid'])) {
  $colname_data_list = (get_magic_quotes_gpc()) ? $_GET['recordid'] : addslashes($_GET['recordid']);
}
mysql_select_db($database_con_cpg, $con_cpg);
$query_data_list = sprintf("SELECT * FROM tbl_data WHERE recordid = %s", GetSQLValueString($colname_data_list, "int"));
$data_list = mysql_query($query_data_list, $con_cpg) or die(mysql_error());
$row_data_list = mysql_fetch_assoc($data_list);
$totalRows_data_list = mysql_num_rows($data_list);

$maxRows_data_list = 100;
$pageNum_data_list = 0;
if (isset($_GET['pageNum_data_list'])) {
  $pageNum_data_list = $_GET['pageNum_data_list'];
}
$startRow_data_list = $pageNum_data_list * $maxRows_data_list;

mysql_select_db($database_con_cpg, $con_cpg);
$query_data_list = "SELECT lastname, firstname, title, company, products, email, entered, recordid FROM tbl_data GROUP BY lastname, firstname, email, lastname HAVING (COUNT(*) > 1)";
$query_limit_data_list = sprintf("%s LIMIT %d, %d", $query_data_list, $startRow_data_list, $maxRows_data_list);
$data_list = mysql_query($query_limit_data_list, $con_cpg) or die(mysql_error());
$row_data_list = mysql_fetch_assoc($data_list);
?>
<?php
// start the dupes routine
$query = 'SELECT email, COUNT(*) AS cnt FROM tbl_data GROUP BY email HAVING cnt > 1 ';
$result = mysql_query($query) or die($query."<br />\n".mysql_error());
$dups = array();
while ($row = mysql_fetch_assoc($result))
{
    $dups[] = $row['email'];
}
$query = 'SELECT * FROM tbl_data WHERE email IN ("'.implode('","', $dups).'") ';
$result = mysql_query($query) or die($query."<br />\n".mysql_error());
?>

<?php
if (isset($_GET['totalRows_data_list'])) {
  $totalRows_data_list = $_GET['totalRows_data_list'];
} else {
  $all_data_list = mysql_query($query_data_list);
  $totalRows_data_list = mysql_num_rows($all_data_list);
}
$totalPages_data_list = ceil($totalRows_data_list/$maxRows_data_list)-1;
?>



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

 

It would appear that the code is using the result set referenced by the variable $data_list to show the results. If you change the return variable name of the second SELECT you should get the output you're looking for.

 

$data_list = mysql_query($query) or die($query."<br />\n".mysql_error());

 

From what I can see Dreamweaver has generated the majority of the code posted. If that is the case and you have further problems I'll have to suggest you read the following tutorial.

 

http://www.phpfreaks.com/tutorials/142/0.php

 

After doing so, create a small page without any of the Dreamweaver code and try to retrieve something from the database using the tutorial or any others you can find on the web as a guide. You'll be better able to troubleshoot the problem after you've done this.

 

You can direct any questions regarding your code to the PHP-Help forum and of course any MYSQL questions here. It's important to keep in mind that the forum is for users learning PHP. If you're not, it's difficult for those of us on the board to help.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.