Jump to content

optimizing the pieces


abazoskib

Recommended Posts

ok so ive been posting random optimization questions, and i think i put it all together in the same thread. i have a client_temp table and a client table. the client_temp table is merely a stage for the data in which it waits to be processed before being moved to the client table if it is good data, and to the client_bad table if it is bad. in order to process this data, i have to see if the primary email address exists in a few other tables. im getting around 200,000 entries a day in the client_temp table so it is a lot.

 

here are the series of steps each row has to follow for processing:

 

1. validity(eregi and MX records) if it is GOOD proceed else place the record into an array for bulk processing later
2. SELECT email FROM badrecords1 WHERE email='$email' if it is GOOD proceed else place the record into an array for bulk processing later
3. SELECT email FROM badrecords2 WHERE email='$email' if it is GOOD proceed else place the record into an array for bulk processing later
4. SELECT email FROM badrecords3 WHERE email='$email' if it is GOOD proceed else place the record into an array for bulk processing later
5. SELECT email FROM client WHERE email='$email'  checks for duplicates...if it is GOOD , placed in array for processing later

 

in the end i have an array of good entries, an array of bad entries, and an array of duplicates. at the end of processing, the entries in the good array get inserted all at once, the bad ones get inserted all at once, and the duplicates update all at once.

 

all in all this is not going as fast as id like it. each record has 4 selects it needs to make. i can provide more detail if necessary, but what can i do to further optimize this?

 

 

Link to comment
Share on other sites

When, you can UNION those 3 bad record tables... and JOIN to $5 -- but I'm not sure if you can sql-ify #1.

 

ah yes, very good. here's a problem though, when an email address exists on any one of those tables, i produce an error code. so the error code for not being in badrecords1 is different from the error code for being in badrecords3 and so on. i did forget to mention that above. im starting to think that the whole model for the program limits its optimization to the point where its at.

Link to comment
Share on other sites

i dont get it. im basically running 5 selects per email, and then at the end one huge insert and one huge delete. how can it be going so slow. by slow i mean ~2000 emails/hour. i need it to be much faster, and i noticed the speed exponentially increase when removing even one of the select queries. my biggest concern is the duplicates, because it has to select over 1 million rows each time.

 

now im thinking, is there any way to cache the whole 'final' table so when checking for duplicates its super fast?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.