Jump to content

Query Optimization


DaShiVa

Recommended Posts

Hi. My first post here, and it's one mother of a query. Well The original query is, however I've stripped it down to the core slowdown, so hopefully it's not too bad now.

Running MySQL Server version: 5.0.87-community; Protocol version: 10

 

I have a query I need to optimize, it's a search results.

The query is dynamically generated based on a fairly complex set of options, filters and sort orders, but I'll get a simple version of the query that is running slow, along with the explain, in the hopes someone can help me speed it up.

I've stripped out most of the fields it collects, as this doesn't impact the speed (however bear in mind I still need to join all the tables shown, as I am actually collecting data from each and every one of them in the final query - and quite a few more tables, that I've removed as they don't impact the query speed)

 

Query:

SELECT websites.website_id,
product_webpage_keywords.product_webpage_keyword_quantity, 
(
SELECT COUNT( product_deals.product_deal_id )
FROM product_deals
WHERE product_deals.product_webpage_keyword_id = product_webpage_keywords.product_webpage_keyword_id
AND product_deals.product_deal_status IN ( 1, 3 ) 
) AS num_product_keyword_deals 
FROM websites
INNER JOIN product_webpages ON product_webpages.website_id = websites.website_id
INNER JOIN product_webpage_keywords ON product_webpage_keywords.product_webpage_id = product_webpages.product_webpage_id
INNER JOIN product_keywords ON product_webpage_keywords.product_keyword_id = product_keywords.product_keyword_id
WHERE product_keywords.product_keyword_name LIKE 'Search'
HAVING num_product_keyword_deals < product_webpage_keywords.product_webpage_keyword_quantity
ORDER BY websites.website_url
LIMIT 0, 25 

 

EXPLAIN for above:

id  select_type         table                      type     possible_keys                          key                         key_len  ref                                                  rows    Extra
1   PRIMARY             product_webpages           ALL      PRIMARY,website_id                     NULL                        NULL     NULL                                                 269459  Using temporary; Using filesort
1   PRIMARY             websites                   eq_ref   PRIMARY                                PRIMARY                     4        product_webpages.website_id                          1    
1   PRIMARY             product_webpage_keywords   ref      product_webpage_id,product_keyword_id  product_webpage_id          4        product_webpages.product_webpage_id                  19    
1   PRIMARY             product_keywords           eq_ref   PRIMARY                                PRIMARY                     4        product_webpage_keywords.product_keyword_id          1       Using where
2   DEPENDENT SUBQUERY  product_deals              ref      product_webpage_keyword_id             product_webpage_keyword_id  4        product_webpage_keywords.product_webpage_keyword_id  1       Using where

#rows in relevant tables:

 

websites has ~100,000 rows

product_deals has ~10,000 rows

product_webpage_keywords has ~5,000,000 rows (this is the big one)

product_webpages has ~400,000 rows

product_keywords has ~10,000 rows

This takes over 30 seconds to run.

 

So far I have tried restructuring the query a half a dozen different ways, and I've added and changed indexes on the tables. Nothing has made a noticable difference.

Caching of the query is not helpful, as the query is run on a different set of filters and page (for many different users, which adds other filters) every time it is run.

Table structure changing is a remote option - I did not design the original tables, and changing them would require rewriting a lot of someone elses code, which I don't really have the time for. If it is a small-ish set of structural changes however, I will definitly consider it.

 

I've read all over about explain and how to read it, but I haven't had a lot of luck grokking this particular aspect of MYSQL, and I'm hoping that the explain shows something obvious to someone to help me out.

 

I can provide more information if it is useful, just ask.

 

Thanks, Dan.

Link to comment
Share on other sites

This topic may be too advanced for me, but what I see is in this subquery:

SELECT COUNT( product_deals.product_deal_id )
FROM product_deals
WHERE product_deals.product_webpage_keyword_id = product_webpage_keywords.product_webpage_keyword_id
AND product_deals.product_deal_status IN ( 1, 3 ) 

You reference two tables without a join.  This would mean many queries would need to be compiled.  *Try* taking the first part of the where out and see if the query executes faster.  If I am shooting in the wrong direction, sorry.

Link to comment
Share on other sites

The subquery links into the main query through the 'second table' in the where.

removing the 'where' would just end up counting all entries in the product_deals for each and every website row, rather than the deals that relate to that website (and other related filters) only.

Link to comment
Share on other sites

Anyone have any ideas? Or is able to "explain the EXPLAIN" to me (i.e. what parts of it I would want to change and how that would be accomplished)?

 

Or does anyone know if I could somehow turn the subquery in the SELECT into a dynamically generated table in the FROM to somehow move the HAVING into a WHERE while maintaining the same GROUP BY (which is needed due to the other tables that are JOINed in the final query)

 

Link to comment
Share on other sites

mysqlcheck -Aa -u"user" -p"pass"

 

I don't have shell access, but I am assuming you want me to analyze all tables?

I've done this with the relevant tables on phpMyAdmin and they're checking out. Also run repair and optimize.

Do you want me to add the analyze reports here? or was that just a facetious post?

 

Link to comment
Share on other sites

Subquery in column list IS a bad idea. Try rewriting it as a JOIN

 

SELECT websites.website_id,
product_webpage_keywords.product_webpage_keyword_quantity,
sq.num_product_keyword_deals
FROM websites
INNER JOIN product_webpages ON product_webpages.website_id = websites.website_id
INNER JOIN product_webpage_keywords ON product_webpage_keywords.product_webpage_id = product_webpages.product_webpage_id
INNER JOIN product_keywords ON product_webpage_keywords.product_keyword_id = product_keywords.product_keyword_id
INNER JOIN (
  SELECT product_webpage_keyword_id, COUNT( product_deals.product_deal_id ) AS num_product_keyword_deals
  FROM product_deals
  WHERE product_deals.product_deal_status IN ( 1, 3 )
  GROUP BY product_webpage_keyword_id
) AS sq
ON sq.product_webpage_keyword_id = product_webpage_keywords.product_webpage_keyword_id
WHERE product_keywords.product_keyword_name LIKE 'Search'
HAVING num_product_keyword_deals < product_webpage_keywords.product_webpage_keyword_quantity
ORDER BY websites.website_url
LIMIT 0, 25 

Link to comment
Share on other sites

Subquery in column list IS a bad idea. Try rewriting it as a JOIN

 

SELECT websites.website_id,
product_webpage_keywords.product_webpage_keyword_quantity,
sq.num_product_keyword_deals
FROM websites
INNER JOIN product_webpages ON product_webpages.website_id = websites.website_id
INNER JOIN product_webpage_keywords ON product_webpage_keywords.product_webpage_id = product_webpages.product_webpage_id
INNER JOIN product_keywords ON product_webpage_keywords.product_keyword_id = product_keywords.product_keyword_id
INNER JOIN (
  SELECT product_webpage_keyword_id, COUNT( product_deals.product_deal_id ) AS num_product_keyword_deals
  FROM product_deals
  WHERE product_deals.product_deal_status IN ( 1, 3 )
  GROUP BY product_webpage_keyword_id
) AS sq
ON sq.product_webpage_keyword_id = product_webpage_keywords.product_webpage_keyword_id
WHERE product_keywords.product_keyword_name LIKE 'Search'
HAVING num_product_keyword_deals < product_webpage_keywords.product_webpage_keyword_quantity
ORDER BY websites.website_url
LIMIT 0, 25 

 

The good news: it's a hell of a lot faster: 0.07-odd seconds.

The bad news: it returns 0 results :/

 

I haven't managed to figure out how to make it into a FROM without changing the results :(

(And changing it from an INNER to a LEFT join still returns no results, but returns it back up to ~30 seconds to execute)

Link to comment
Share on other sites

Does this:

SELECT product_webpage_keyword_id, COUNT( product_deals.product_deal_id ) AS num_product_keyword_deals
  FROM product_deals
  WHERE product_deals.product_deal_status IN ( 1, 3 )
  GROUP BY product_webpage_keyword_idp

 

return any rows?

Link to comment
Share on other sites

Does this:

SELECT product_webpage_keyword_id, COUNT( product_deals.product_deal_id ) AS num_product_keyword_deals
  FROM product_deals
  WHERE product_deals.product_deal_status IN ( 1, 3 )
  GROUP BY product_webpage_keyword_idp

 

return any rows?

 

~200 rows returned

Link to comment
Share on other sites

That would mean no rows match the join condition:

ON sq.product_webpage_keyword_id = product_webpage_keywords.product_webpage_keyword_id

 

Which is acceptable, as a 0 value for #deals means there are deals available, thus a valid result. this subquery is used to exclude matches what have no available deals (the having clause: subtracts used deals (this subquery) from available deals (out of product_webpage_keywords))

The problem is when I run the query with the subquery in the FROM tableset, as suggested, I get no results at all, but when it is in the SELECT fields, I get the full and proper results. (As I am running this on test data, on a low instance keyword, it is not surprising no deals exist for this particular query's candidates, but in the live environment there almost always will be some, and they must be filtered out)

Link to comment
Share on other sites

I get it now I think. Try like this.

SELECT websites.website_id,
product_webpage_keywords.product_webpage_keyword_quantity,
sq.num_product_keyword_deals
FROM websites
INNER JOIN product_webpages ON product_webpages.website_id = websites.website_id
INNER JOIN product_webpage_keywords ON product_webpage_keywords.product_webpage_id = product_webpages.product_webpage_id
INNER JOIN product_keywords ON product_webpage_keywords.product_keyword_id = product_keywords.product_keyword_id
INNER JOIN (
  SELECT product_webpage_keyword_id, COUNT( product_deals.product_deal_id ) AS num_product_keyword_deals
  FROM product_deals
  WHERE product_deals.product_deal_status IN ( 1, 3 )
  GROUP BY product_webpage_keyword_id
) AS sq
ON sq.product_webpage_keyword_id = product_webpage_keywords.product_webpage_keyword_id
WHERE product_keywords.product_keyword_name LIKE 'Search'
AND (sq.num_product_keyword_deals < product_webpage_keywords.product_webpage_keyword_quantity OR sq.num_product_keyword_deals IS NULL)
ORDER BY websites.website_url
LIMIT 0, 25 

Link to comment
Share on other sites

I get it now I think. Try like this.

SELECT websites.website_id,
product_webpage_keywords.product_webpage_keyword_quantity,
sq.num_product_keyword_deals
FROM websites
INNER JOIN product_webpages ON product_webpages.website_id = websites.website_id
INNER JOIN product_webpage_keywords ON product_webpage_keywords.product_webpage_id = product_webpages.product_webpage_id
INNER JOIN product_keywords ON product_webpage_keywords.product_keyword_id = product_keywords.product_keyword_id
INNER JOIN (
  SELECT product_webpage_keyword_id, COUNT( product_deals.product_deal_id ) AS num_product_keyword_deals
  FROM product_deals
  WHERE product_deals.product_deal_status IN ( 1, 3 )
  GROUP BY product_webpage_keyword_id
) AS sq
ON sq.product_webpage_keyword_id = product_webpage_keywords.product_webpage_keyword_id
WHERE product_keywords.product_keyword_name LIKE 'Search'
AND (sq.num_product_keyword_deals < product_webpage_keywords.product_webpage_keyword_quantity OR sq.num_product_keyword_deals IS NULL)
ORDER BY websites.website_url
LIMIT 0, 25 

 

Change the INNER to a LEFT (as you had implied with your NULL check) and it brings back the exact same results, meaning it works, and thanks, I prefer it to the select/having.

The query, however still runs at ~30 seconds (albeit about a second faster, but that's without doing extended testing)

 

Give me 10 minutes or so to rework the updated EXPLAIN output and I'll post it here.

Link to comment
Share on other sites

Updated Query:

SELECT websites.website_id,
product_webpage_keywords.product_webpage_keyword_quantity,
sq.num_product_keyword_deals
FROM websites
INNER JOIN product_webpages ON product_webpages.website_id = websites.website_id
INNER JOIN product_webpage_keywords ON product_webpage_keywords.product_webpage_id = product_webpages.product_webpage_id
INNER JOIN product_keywords ON product_webpage_keywords.product_keyword_id = product_keywords.product_keyword_id
LEFT JOIN (
  SELECT product_webpage_keyword_id, COUNT( product_deals.product_deal_id ) AS num_product_keyword_deals
  FROM product_deals
  WHERE product_deals.product_deal_status IN ( 1, 3 )
  GROUP BY product_webpage_keyword_id
) AS sq
ON sq.product_webpage_keyword_id = product_webpage_keywords.product_webpage_keyword_id
WHERE product_keywords.product_keyword_name LIKE 'Search'
AND (sq.num_product_keyword_deals < product_webpage_keywords.product_webpage_keyword_quantity OR sq.num_product_keyword_deals IS NULL)
ORDER BY websites.website_url
LIMIT 0, 25 

EXPLAIN:

id  select_type  table                     type    possible_keys                          key      key_len  ref                                          rows     Extra
1   PRIMARY      product_webpage_keywords  ALL     product_webpage_id,product_keyword_id  NULL     NULL     NULL                                         4371444  Using temporary; Using filesort
1   PRIMARY      product_webpages          eq_ref  PRIMARY,website_id                     PRIMARY  4        product_webpage_keywords.product_webpage_id  1          
1   PRIMARY      websites                  eq_ref  PRIMARY                                PRIMARY  4        product_webpages.website_id                  1          
1   PRIMARY      product_keywords          eq_ref  PRIMARY                                PRIMARY  4        product_webpage_keywords.product_keyword_id  1        Using where
1   PRIMARY      <derived2>                ALL     NULL                                   NULL     NULL     NULL                                         120      Using where
2   DERIVED      product_deals             ALL     NULL                                   NULL     NULL     NULL                                         1286     Using where; Using temporary; Using filesort

 

Results as good as the same as original (some NULLs in place of 0's, but that works fine) Execution time seems to be a hair faster, but still ~30 seconds

Link to comment
Share on other sites

Well, after a couple days more digging and experimenting I've figured it out:

It was the MySQL optomizer.

So, rewriting teh query slightly (for optimum table order) and adding the STRAIGHT_JOIN keyword after SELECT (to override the optomizer) and it's gone from ~30 to ~1.5 seconds.

And, as the original query inner joins in about a dozen tables and left joins a dozen more, and is dynamically generated, blah blah blah, I'm about to spend the next couple of days manually optimizing the whole query manually and rewriting the code that puts it together, which is ~5,000 lines (well, could probably be squished into ~2-3,000 but it's still a lot)

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.