Jump to content

searching for duplicates before insertion


abazoskib

Recommended Posts

i have a live data feed coming into my database from several different sources. i do not accept duplicate entries for the email address field, and so i check each email coming in with a function:

 

function checkDuplicates($email){
                $queryCD = "SELECT email FROM customer WHERE email='$email'";
                $resultCD = mysql_query($queryCD);
                $numRows = mysql_num_rows($resultCD);
                if($numRows > 0) return false; //EMAIL IS A DUPLICATE
                else return true; //EMAIL IS GOOD
        }

 

I need to optimize everything to the max. Am I doing it the best way possible?

You could add LIMIT 1 to the end of your query so if it does exist it will only find the first result and then exit the query from there rather than continuing the search.

 

SELECT email FROM customer WHERE email='$email' LIMIT 1

 

I'd also probably change the query itself to select count so it returns a number rather that doing a query and then returning num rows.

 

My function would be..

 

function checkDuplicates($email){

$queryCount = "mysql_query(SELECT count(email) WHERE email='$email' LIMIT 1)";
   if($queryCount > 0) return false; //EMAIL IS A DUPLICATE
     else return true; //EMAIL IS GOOD
}

there are over a million records in the one table i have and currently growing. i noticed that if i do a select where email=$email for example, i get a result faster than a select count(email) where email=$email. A few seconds faster actually which makes all the difference.

You know there is a INSERT ... ON DUPLICATE KEY UPDATE ... syntax, right?

 

yes, but i need to know when the email is a duplicate for analytical reasons. also, the primary key is composed of an auto incremented key and an email, so an email could potentially be inserted twice, knowing that the first part of the key will always be unique.

I'm taking a guess here (because i've not checked MySQL docs) but if you make your email column a unique index (as you should be doing if it's unique anyway) then the INSERT ... ON DUPLICATE KEY UPDATE... should work for you.

However, because you don't actually want to UPDATE anything on a DUPLICATE then you're better off using IGNORE

e.g.

INSERT IGNORE INTO <tablename>('col', ....) VALUES(1,...);

 

However, because an INSERT has a higher overhead than a SELECT you would probably be better off doing the SELECT 1st (as you were).

 

Going back to the original point SELECT query, I see no need for the LIMIT 1 because if you're saying that your email field is unique (because you don't accept dupes) then you will only ever get 1 result.

Thus:

 

<?php
/**
* Email class for performing acts on email addresses.
*/
class Email {

  private $_email;
  private $_isDuplicate;

  public function __construct($email, $db = null){
    $this->_email = $email;
    if(null === $db) {
      $db = mysql_connect('localhost','username','password');
      mysql_select_db('database');
    }
    $this->_db = $db;
  }

  private function getDB(){
    return $this->_db;
  }

  public function getEmail(){
    return $this->_email;
  }

  public function isDuplicate(){
    if(null === $this->_isDuplicate){
      $this->_isDuplicate = false;
      $result = mysql_fetch_row(mysql_query(
        sprintf(
          "SELECT COUNT(*) FROM customer WHERE email='%s'",
          mysql_real_escape_string($this->getEmail())
         )
        , $this->getDB()
      ));
      if($result[0]){
        $this->_isDuplicate = true;
      }
    }
    return $this->_isDuplicate;    
  }
}

/**
* @example
*/
foreach($emails as $email){
  $test = new Email($email, $db);
  if(!$test->isDuplicate()){
    // INSERT new record.
  }
}
?>

 

 

The above code is pretty extravagant for what you're looking for, but you get the idea I hope.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.