Jump to content

searching for duplicates before insertion


abazoskib

Recommended Posts

i have a live data feed coming into my database from several different sources. i do not accept duplicate entries for the email address field, and so i check each email coming in with a function:

 

function checkDuplicates($email){
                $queryCD = "SELECT email FROM customer WHERE email='$email'";
                $resultCD = mysql_query($queryCD);
                $numRows = mysql_num_rows($resultCD);
                if($numRows > 0) return false; //EMAIL IS A DUPLICATE
                else return true; //EMAIL IS GOOD
        }

 

I need to optimize everything to the max. Am I doing it the best way possible?

Link to comment
Share on other sites

You could add LIMIT 1 to the end of your query so if it does exist it will only find the first result and then exit the query from there rather than continuing the search.

 

SELECT email FROM customer WHERE email='$email' LIMIT 1

 

I'd also probably change the query itself to select count so it returns a number rather that doing a query and then returning num rows.

 

My function would be..

 

function checkDuplicates($email){

$queryCount = "mysql_query(SELECT count(email) WHERE email='$email' LIMIT 1)";
   if($queryCount > 0) return false; //EMAIL IS A DUPLICATE
     else return true; //EMAIL IS GOOD
}

Link to comment
Share on other sites

there are over a million records in the one table i have and currently growing. i noticed that if i do a select where email=$email for example, i get a result faster than a select count(email) where email=$email. A few seconds faster actually which makes all the difference.

Link to comment
Share on other sites

You know there is a INSERT ... ON DUPLICATE KEY UPDATE ... syntax, right?

 

yes, but i need to know when the email is a duplicate for analytical reasons. also, the primary key is composed of an auto incremented key and an email, so an email could potentially be inserted twice, knowing that the first part of the key will always be unique.

Link to comment
Share on other sites

I'm taking a guess here (because i've not checked MySQL docs) but if you make your email column a unique index (as you should be doing if it's unique anyway) then the INSERT ... ON DUPLICATE KEY UPDATE... should work for you.

However, because you don't actually want to UPDATE anything on a DUPLICATE then you're better off using IGNORE

e.g.

INSERT IGNORE INTO <tablename>('col', ....) VALUES(1,...);

 

However, because an INSERT has a higher overhead than a SELECT you would probably be better off doing the SELECT 1st (as you were).

 

Going back to the original point SELECT query, I see no need for the LIMIT 1 because if you're saying that your email field is unique (because you don't accept dupes) then you will only ever get 1 result.

Thus:

 

<?php
/**
* Email class for performing acts on email addresses.
*/
class Email {

  private $_email;
  private $_isDuplicate;

  public function __construct($email, $db = null){
    $this->_email = $email;
    if(null === $db) {
      $db = mysql_connect('localhost','username','password');
      mysql_select_db('database');
    }
    $this->_db = $db;
  }

  private function getDB(){
    return $this->_db;
  }

  public function getEmail(){
    return $this->_email;
  }

  public function isDuplicate(){
    if(null === $this->_isDuplicate){
      $this->_isDuplicate = false;
      $result = mysql_fetch_row(mysql_query(
        sprintf(
          "SELECT COUNT(*) FROM customer WHERE email='%s'",
          mysql_real_escape_string($this->getEmail())
         )
        , $this->getDB()
      ));
      if($result[0]){
        $this->_isDuplicate = true;
      }
    }
    return $this->_isDuplicate;    
  }
}

/**
* @example
*/
foreach($emails as $email){
  $test = new Email($email, $db);
  if(!$test->isDuplicate()){
    // INSERT new record.
  }
}
?>

 

 

The above code is pretty extravagant for what you're looking for, but you get the idea I hope.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.