optimizing the pieces

abazoskib · August 4, 2009

ok so ive been posting random optimization questions, and i think i put it all together in the same thread. i have a client_temp table and a client table. the client_temp table is merely a stage for the data in which it waits to be processed before being moved to the client table if it is good data, and to the client_bad table if it is bad. in order to process this data, i have to see if the primary email address exists in a few other tables. im getting around 200,000 entries a day in the client_temp table so it is a lot.

here are the series of steps each row has to follow for processing:

1. validity(eregi and MX records) if it is GOOD proceed else place the record into an array for bulk processing later
2. SELECT email FROM badrecords1 WHERE email='$email' if it is GOOD proceed else place the record into an array for bulk processing later
3. SELECT email FROM badrecords2 WHERE email='$email' if it is GOOD proceed else place the record into an array for bulk processing later
4. SELECT email FROM badrecords3 WHERE email='$email' if it is GOOD proceed else place the record into an array for bulk processing later
5. SELECT email FROM client WHERE email='$email'  checks for duplicates...if it is GOOD , placed in array for processing later

in the end i have an array of good entries, an array of bad entries, and an array of duplicates. at the end of processing, the entries in the good array get inserted all at once, the bad ones get inserted all at once, and the duplicates update all at once.

all in all this is not going as fast as id like it. each record has 4 selects it needs to make. i can provide more detail if necessary, but what can i do to further optimize this?

fenway · August 4, 2009

When, you can UNION those 3 bad record tables... and JOIN to $5 -- but I'm not sure if you can sql-ify #1.

abazoskib · August 4, 2009

When, you can UNION those 3 bad record tables... and JOIN to $5 -- but I'm not sure if you can sql-ify #1.

ah yes, very good. here's a problem though, when an email address exists on any one of those tables, i produce an error code. so the error code for not being in badrecords1 is different from the error code for being in badrecords3 and so on. i did forget to mention that above. im starting to think that the whole model for the program limits its optimization to the point where its at.

abazoskib · August 5, 2009

i dont get it. im basically running 5 selects per email, and then at the end one huge insert and one huge delete. how can it be going so slow. by slow i mean ~2000 emails/hour. i need it to be much faster, and i noticed the speed exponentially increase when removing even one of the select queries. my biggest concern is the duplicates, because it has to select over 1 million rows each time.

now im thinking, is there any way to cache the whole 'final' table so when checking for duplicates its super fast?

fenway · August 10, 2009

Well, you need to establish what the rate-limiting step is -- are the mysql queries optimized? check EXPLAIN.

Sign In

optimizing the pieces

Recommended Posts

abazoskib

Link to comment

Share on other sites

fenway

Link to comment

Share on other sites

abazoskib

Link to comment

Share on other sites

abazoskib

Link to comment

Share on other sites

fenway

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information