Jump to content

Counting duplicates on bulk INSERT IGNORE / INSERT ... ON DUPLICATE KEY UPDATE


Recommended Posts

So I have a script that imports CSV's for contact lists into a table ( http://www.phpfreaks.com/forums/php-coding-help/large-csv-import/ ) and to make it as efficient as possible it detects duplicates purely in SQL rather than relying on PHP and bloating the code.

 

The script can be set to either overwrite or ignore any duplicates, and I need to be able to count how many duplicates it found. Depending on which option is set for how to handle duplicates, the query looks like the follow. Note: These are bulk inserts where multiple rows are inserted in a single query. This example shows 5 CSV rows in the batch query, but this number is much higher on the live script

 

INSERT IGNORE INTO contacts (site_id, c_registered, c_domain, c_subscribed, c_name, c_email) VALUES ('10', NOW(), 'domain.co.uk', '1', 'Chris', 'chris@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Cyn', 'cyn@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Chloe', 'chloe@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Jennie', 'jennie@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Steve', 'steve@domain.co.uk');

 

INSERT INTO contacts (site_id, c_registered, c_domain, c_subscribed, c_name, c_email) VALUES ('10', NOW(), 'domain.co.uk', '1', 'Chris', 'chris@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Cyn', 'cyn@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Chloe', 'chloe@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Jennie', 'jennie@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Steve', 'steve@domain.co.uk') ON DUPLICATE KEY UPDATE c_name = VALUES(c_name);

 

My problem is that I need somehow to be able to detect accurately how many duplicates there were mixed in with rows that weren't. Some people have suggested mysql_errno and mysql_affected_rows but these dont seem to work for the way I have implemented things

 

Any ideas?

Sorry I forgot to say, that only marks it as a duplicate if something has changed. I need it to be marked as a duplicate even if the row is identicle.

 

Actually just thought, this works if I update a field such as a time stamp :) nice one!!

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.