Jump to content

[SOLVED] subquery REALLY slow


crazytonyi

Recommended Posts

If anyone needs background on this project, or wants to offer more broad advice, go to this post:

http://www.phpfreaks.com/forums/index.php/topic,231571.0.html

 

So, now I've got my queries out of any loops, so I'm feeling a lot better overall. However, I'm still getting problems with sub-queries.

 

For those of you who are keeping up, I got around the looping by having the swap query omit any overlapping shift times all at once. I realized that my script had to pull the user's shifts each time the page loaded in order to display them, so now when it loops through them to echo them out to the user, it also adds the shift to a string like this:

 

$conflict_dates .= 
"AND shiftstart NOT BETWEEN FROM_UNIXTIME('".(strtotime($shift_row['shiftstart']) + 60)."') AND FROM_UNIXTIME('".(strtotime($shift_row['shiftend']) + 60)."')
AND FROM_UNIXTIME('".(strtotime($shift_row['shiftstart']) + 60)."') NOT BETWEEN shiftstart AND shiftend\n";

 

Having done that, I'm now down to 3 queries:

  • The first gets the users shifts
  • The second one finds any OTHER users working at the same time as the shift the user wants to swap
  • The third finds shifts that are the same length as the swap shift AND don't conflict with any of the user's shift AND are not on the blacklisted users from the second query.

 

Now, the first one MUST happen, because without it, the user won't have a list of shifts to choose from. But the second should be a subquery of the third one. But when I tried doing this, the page took 47 seconds to load versus doing the two queries separately which took under half a second. In either case, the same results are returned, so I know I'm not totally doing this wrong. But obviously I'm missing something, because the query shouldn't take 100 times longer.

 

The mySQL server is 5.0.60

 

Here's the jist of how it looks:

 

First query gets all user's shifts based on predetermined date range:

 

SELECT * FROM shifts
WHERE
userid = 'joe_smith'
AND shiftstart BETWEEN FROM_UNIXTIME('epoch_of_Monday') AND FROM_UNIXTIME('epoch_of_Friday')

 

As mentioned above, the script outputs the shifts to the user AND makes a string variable for the last query.

 

 

The next query (after the user chooses a shift) gets the user IDs of anyone who has a shift that overlaps with the shift chosen:

 

SELECT DISTINCT userid FROM shifts
WHERE
(
shiftstart BETWEEN FROM_UNIXTIME('shift_start_plus_1_minute') AND FROM_UNIXTIME('shift_end_minus_1_minute')
OR
FROM_UNIXTIME('epoch_of_shift_start_plus_1_minute') BETWEEN shiftstart AND shiftend
)

 

Now, having a blacklist of users (formatted correctly by the script), it queries for any shifts that meet the criteria for swapping:

 

SELECT * FROM shifts
WHERE
userid NOT IN (list_of_userids) 
AND shiftstart BETWEEN FROM_UNIXTIME('epoch_of_Monday') AND FROM_UNIXTIME('epoch_of_Friday')
AND (TIME_TO_SEC(TIMEDIFF(shiftend,shiftstart)) = 'shift_length')
//conflicts string, written out for this post//
AND shiftstart NOT BETWEEN FROM_UNIXTIME('joes_shift1_start') AND FROM_UNIXTIME('joes_shift1_end')
AND FROM_UNIXTIME('joes_shift1_start') NOT BETWEEN shiftstart AND shiftend
AND shiftstart NOT BETWEEN FROM_UNIXTIME('joes_shift2_start') AND FROM_UNIXTIME('joes_shift2_end')
AND FROM_UNIXTIME('joes_shift2_start') NOT BETWEEN shiftstart AND shiftend
AND shiftstart NOT BETWEEN FROM_UNIXTIME('joes_shift3_start') AND FROM_UNIXTIME('joes_shift3_end')
AND FROM_UNIXTIME('joes_shift3_start') NOT BETWEEN shiftstart AND shiftend
AND shiftstart NOT BETWEEN FROM_UNIXTIME('joes_shift4_start') AND FROM_UNIXTIME('joes_shift4_end')
AND FROM_UNIXTIME('joes_shift4_start') NOT BETWEEN shiftstart AND shiftend
AND shiftstart NOT BETWEEN FROM_UNIXTIME('joes_shift5_start') AND FROM_UNIXTIME('joes_shift5_end')
AND FROM_UNIXTIME('joes_shift5_start') NOT BETWEEN shiftstart AND shiftend
ORDER BY shiftstart, lastname

 

Now, the way I did the subquery was to place the second query's variable name ($busy_query) into the parenthesis of the third query, replacing the variable tht held the list of bad names.  So it would look more like(if it was fully written out):

 

SELECT * FROM shifts
WHERE
userid NOT IN  (
SELECT DISTINCT userid FROM shifts
WHERE
(
shiftstart BETWEEN FROM_UNIXTIME('shift_start_plus_1_minute') AND FROM_UNIXTIME('shift_end_minus_1_minute')
OR
FROM_UNIXTIME('epoch_of_shift_start_plus_1_minute') BETWEEN shiftstart AND shiftend
)
) 
AND shiftstart BETWEEN FROM_UNIXTIME('epoch_of_Monday') AND FROM_UNIXTIME('epoch_of_Friday')
AND (TIME_TO_SEC(TIMEDIFF(shiftend,shiftstart)) = 'shift_length')
//conflicts string, written out for this post//
AND shiftstart NOT BETWEEN FROM_UNIXTIME('joes_shift1_start') AND FROM_UNIXTIME('joes_shift1_end')
AND FROM_UNIXTIME('joes_shift1_start') NOT BETWEEN shiftstart AND shiftend
AND shiftstart NOT BETWEEN FROM_UNIXTIME('joes_shift2_start') AND FROM_UNIXTIME('joes_shift2_end')
AND FROM_UNIXTIME('joes_shift2_start') NOT BETWEEN shiftstart AND shiftend
AND shiftstart NOT BETWEEN FROM_UNIXTIME('joes_shift3_start') AND FROM_UNIXTIME('joes_shift3_end')
AND FROM_UNIXTIME('joes_shift3_start') NOT BETWEEN shiftstart AND shiftend
AND shiftstart NOT BETWEEN FROM_UNIXTIME('joes_shift4_start') AND FROM_UNIXTIME('joes_shift4_end')
AND FROM_UNIXTIME('joes_shift4_start') NOT BETWEEN shiftstart AND shiftend
AND shiftstart NOT BETWEEN FROM_UNIXTIME('joes_shift5_start') AND FROM_UNIXTIME('joes_shift5_end')
AND FROM_UNIXTIME('joes_shift5_start') NOT BETWEEN shiftstart AND shiftend
ORDER BY shiftstart, lastname

 

Maybe it's too much for one query? But even when I do it without the conflicts, it takes the same amount of time.

 

Is this a syntax problem or a server issue or what?

 

Thanks!

 

a

Link to comment
Share on other sites

I figured out the source of the problem, I think.

 

When I join the two queries together, the subquery is treated by the mysql server as a "DEPENDENT SUBQUERY", which means, among other things, that it checks EACH row for to see if it meets the sub-query criteria when it really only needs to find the user IDs to skip once BEFORE looking for the other WHERE criteria. I learned this at:

 

http://forums.mysql.com/read.php?115,128477,128477

 

The suggested solution from the above site is basically what I was already doing, which is run two queries and pass the data from the first on to the second.

 

If anyone knows a better technique, I'd really appreciate the advice.

 

a

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.