Jump to content

[SOLVED] Query Optimization


roopurt18

Recommended Posts

I've got this query that reports details on a group of users within our system I'd like to optimize.  On average, the query takes 3.5 - 4.25s to run.

SELECT 
  you.user_wvid AS UserID, 
  LOWER(you.login) AS Login, 
  LOWER(you.password) AS `Password`, 
  you.subname AS SubName, 
  s.full_name AS FullName, 
  @onsites:=SUM(IF(t.onsite=1, 1, 0)) AS HasOnSites,
  @offsites:=SUM(IF(t.onsite=0, 1, 0)) AS HasOffSites,
  CASE 
    WHEN @onsites > 0 AND @offsites > 0 THEN
      'both'    
    WHEN @onsites > 0 AND @offsites = 0 THEN
      'onsite'     
    WHEN @offsites > 0 AND @onsites = 0 THEN
      'offsite'
  END AS SubType, 
  MAX(a.TimeRecord) AS LastActiveStamp,
  IFNULL(DATE_FORMAT(a.TimeRecord, '%a. %b. %D, %Y'), '-') AS LastActiveDisp 

FROM
  wv_user you
  CROSS JOIN wssubc s ON you.subname=s.subname 
  CROSS JOIN SubProjectAccess pa ON you.subname=pa.SubName 
  CROSS JOIN wstrades t ON pa.Trade=t.tcode 
  LEFT JOIN UserActivity a ON you.login=a.UserLogin

WHERE you.role_wvid=2
GROUP BY you.login
ORDER BY you.login
LIMIT 0,25

 

EXPLAIN gives...

explain_orig.JPG

 

I'm trying to get rid of the using temporary in the EXPLAIN results but not having much luck.

 

Any suggestions?

Link to comment
Share on other sites

Try putting a multi-column index on (role_wvid, login) and try the query again. Use an EXPLAIN and post the results. If MYSQL does not use the new index, try forcing it with "SELECT .... FROM wv_user AS you FORCE INDEX(indexname) CROSS JOIN ....".

 

If the index is still not used then you can SELECT and LIMIT the rows from the wv_user table first in a subquery and then do the JOIN

SELECT 
you.*, s.full_name .... etc etc
FROM
(
    SELECT
    sub_you.user_wvid AS UserID, 
    LOWER(sub_you.login) AS Login, 
    LOWER(sub_you.password) AS `Password`, 
    sub_you.subname AS SubName
    FROM
    wv_user AS sub_you
    WHERE
    sub_you.role_wvid = 2
    ORDER BY
    sub_you.login
    LIMIT 0,25
) AS you
.......
CROSS JOIN ...
CROSS JOIN ....
...
GROUP BY
you.login
ORDER BY
you.login

 

Note that there is no second WHERE clause. The ORDER BY is repeated because it shouldn't be assumed that the JOIN will leave the result in the order created by the subquery.

 

Could you also post each EXPLAIN after each attempt, even if one of the solution works.

 

Btw, the following line should be.

 

IFNULL(DATE_FORMAT(MAX(a.TimeRecord), '%a. %b. %D, %Y'), '-') AS LastActiveDisp 

 

I'd also recommend against using the variables and instead do.

 

CASE 
    WHEN SUM(IF(t.onsite=1, 1, 0)) > 0 AND  SUM(IF(t.onsite=0, 1, 0)) > 0 THEN
    'both'
etc

Link to comment
Share on other sites

I'm not at work anymore so I'll have to try your suggestions tomorrow, but...

 

MAX(a.TimeRecord) AS LastActiveStamp,
IFNULL(DATE_FORMAT(a.TimeRecord, '%a. %b. %D, %Y'), '-') AS LastActiveDisp

 

I can't be sure without the code in front of me, but I think the program itself needs the timestamp as well, which is why I'm returning two separate fields there.

 

As for the variables, my original query wasn't using them and I had added them thinking it would probably be more optimal.  Is there any reason why that's not the case?

 

I'll report back on the other stuff some time tomorrow.

 

Thanks for the responses.

Link to comment
Share on other sites

I'm not at work anymore so I'll have to try your suggestions tomorrow, but...

 

MAX(a.TimeRecord) AS LastActiveStamp,
IFNULL(DATE_FORMAT(a.TimeRecord, '%a. %b. %D, %Y'), '-') AS LastActiveDisp

 

I can't be sure without the code in front of me, but I think the program itself needs the timestamp as well, which is why I'm returning two separate fields there.

 

I meant that the second line should have DATE_FORMAT(MAX(a.timeRecord), the MAX() is missing.

 

As for the variables, my original query wasn't using them and I had added them thinking it would probably be more optimal.  Is there any reason why that's not the case?

 

My first reason for suggesting you not use the variables is

 

The general rule is never to assign a value to a user variable in one part of a statement and use the same variable in some other part of the same statement. You might get the results you expect, but this is not guaranteed.

 

I also believe that identical expressions are evaluated only once but I can't find the entry in the manual (assuming it exists) that corroborates that. Assume that I'm wrong for now.

Link to comment
Share on other sites

I placed a multi-column index (role_wvid, login) on wv_user and the using temporary; using filesort has been eliminated.

explain_multi_idx.jpg

 

Would someone be willing to offer a brief explanation or link to an explanation of why the multi-column index made a difference in this case?  I'm still a bit of a noob when it comes to DB optimization.

 

Also, if the LIMIT is left off the query or if the second part of the limit extends the actual number of records (LIMIT 100,25 on a table with 102 records), I notice that using temporary; using filesort makes its way back into EXPLAIN.  What would be the cause of this? Would it be more efficient to precede this query with another that counts the number of records in wv_user and stops the LIMIT from extending beyond the actual number of records?

Link to comment
Share on other sites

I've now removed the multi-column index I created earlier on wv_user and restructured the query:

SELECT
  u.user_wvid AS UserID,
  LOWER(u.login) AS Login,
  LOWER(u.password) AS `Password`,
  u.subname AS SubName,
  s.full_name AS FullName,
  SUM(IF(t.onsite=1, 1, 0)) AS HasOnSites,
  SUM(IF(t.onsite=0, 1, 0)) AS HasOffSites,
  CASE
    WHEN SUM(IF(t.onsite=1, 1, 0)) > 0 AND SUM(IF(t.onsite=0, 1, 0)) > 0 THEN
      'both'
    WHEN SUM(IF(t.onsite=1, 1, 0)) > 0 AND SUM(IF(t.onsite=0, 1, 0)) = 0 THEN
      'onsite'
    WHEN SUM(IF(t.onsite=0, 1, 0)) > 0 AND SUM(IF(t.onsite=1, 1, 0)) = 0 THEN
      'offsite'
  END AS SubType,
  IFNULL(
    (
    SELECT a.TimeRecord AS LastActiveStamp FROM UserActivity a
    WHERE a.UserLogin=u.login
    ORDER BY a.TimeRecord DESC LIMIT 1
    ), '-') AS LastActiveStamp,
  IFNULL( DATE_FORMAT(
    (
    SELECT a.TimeRecord AS LastActiveStamp FROM UserActivity a
    WHERE a.UserLogin=u.login
    ORDER BY a.TimeRecord DESC LIMIT 1
    ), '%a. %b. %D, %Y' ), '-') AS LastActiveDisp

FROM wv_user u
CROSS JOIN wssubc s ON u.subname=s.subname
CROSS JOIN SubProjectAccess pa ON u.subname=pa.SubName
CROSS JOIN wstrades t ON pa.Trade=t.tcode

WHERE
  u.role_wvid=2

GROUP BY u.login
ORDERY BY _xyz_
LIMIT 0,25

 

I've structured the query in this manner because there is an external parameter that determines the ORDER BY, which can be any of the fields returned by the query (UserID, Login, LastActiveStamp, etc).  Using any of the fields with ORDER BY will cause using temporary to show up in EXPLAIN, but it's consistently faster than 1s so I'm happy with the results for now.

 

Also, I had forgotten about using sub-queries since our last host was MySQL 4.0.x, thanks for reminding me!

Link to comment
Share on other sites

I placed a multi-column index (role_wvid, login) on wv_user and the using temporary; using filesort has been eliminated.

explain_multi_idx.jpg

 

Would someone be willing to offer a brief explanation or link to an explanation of why the multi-column index made a difference in this case?  I'm still a bit of a noob when it comes to DB optimization.

 

These are the relevant parts of the query that make the multi-column index useful.

 

WHERE you.role_wvid=2
ORDER BY you.login
LIMIT 0,25

 

The basic explanation is that without the multi-column index MYSQL had to retrieve ALL the rows from the disk that had a role_wvid with a value of 2 and then ORDER/sort and LIMIT the results using a temporary table.

 

With an index which can be thought of as being similar to the following

 

role_wvid    login
-------------------------
1             john
2             bill
2             jim
2             suzan
3             betty
3             mark

 

you'll notice that MYSQL will be able to lookup the rows with a lot_wvid value of 2 and retrieve the rows in the order of the "login" without much effort using only the index. This also allows MYSQL to retrieve only the first 25 rows in the table that match. The EXPLAIN thinks that it will have to look at 101 rows but when the actual query is run it doesn't(shouldn't) because of the LIMIT.

 

Many times I've put a multi-column index on the ORDER BY column before the WHERE clause column to achieve similar results as well. Keep in mind that this is my "understanding" of what happens and shouldn't be thought of as an authoritative answer.

 

Also, if the LIMIT is left off the query or if the second part of the limit extends the actual number of records (LIMIT 100,25 on a table with 102 records), I notice that using temporary; using filesort makes its way back into EXPLAIN.  What would be the cause of this?

 

I don't know why that occurs. It could be a bug that doesn't allow MYSQL to recognize that it can use the same plan to execute the query when the LIMIT extends beyond the number of records in the table. If the EXPLAIN also shows that it doesn't use the multi-column index, then a FORCE INDEX will probably help.

 

Would it be more efficient to precede this query with another that counts the number of records in wv_user and stops the LIMIT from extending beyond the actual number of records?

 

Usually you'd be using the LIMIT to create some kind of pagination for the results. Generally you'd retrieve the COUNT before hand to determine whether or not a "next" button should be displayed. Even if that's not the case retrieving the COUNT of the records in the wv_user table with a role_wvid of the specified value should be relatively quick.

 

In regard to your post about restructuring the query. Another approach in light of the variable ORDER BY column may be to create specific queries based on which column is used in the ORDER BY. SELECTing and LIMITing the results in a subquery using the relevant table. I haven't worked it out in my mind completely so it may or may not be practical in this instance.

 

You should find the following site helpful btw Optimizing MYSQL Queries

Link to comment
Share on other sites

I probably won't revisit this query for a while since I'm happy with the current results.  The whole page used to take 7 or 8 seconds to load and now it's down to about 1.5s.

 

Another approach in light of the variable ORDER BY column may be to create specific queries based on which column is used in the ORDER BY. SELECTing and LIMITing the results in a subquery using the relevant table.

I'll keep this in the back of my mind in case I do have to come back to it or face a similar situation elsewhere.

 

Again, thanks to everyone for all their help.

 

:)

Link to comment
Share on other sites

Another approach in light of the variable ORDER BY column may be to create specific queries based on which column is used in the ORDER BY. SELECTing and LIMITing the results in a subquery using the relevant table.

I'll keep this in the back of my mind in case I do have to come back to it or face a similar situation elsewhere.

 

After putting more thought into it I don't believe the suggestion is usable. Other ideas of JOINing the wv_user and activity table in a subquery may be better to keep in mind.

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.