Jump to content

Help understanding why this Mysql Query is badly designed.


Recommended Posts

I'm currently using a external module from an opencart vendor on my opencart store, however I keep getting a error stating mysql has gone away. 

 

Thinking this might be an issue with the mysql wait time setting I contacted my server support team and asked them to investigate, below is there answer. 

 

I'm struggling to understand exactly why the query is so badly designed as stated below, any advice on the subject would be greatly appreciated

 

 

 

Hello,

After a lot of investigation we have determined the following:

The query that the script is trying to execute is:

------------------------------------------------------------------------------------------------------------
# Time: 151219 13:26:42
# User@Host: ******[******] @ [*******]
# Query_time: 124.617125 Lock_time: 0.004025 Rows_sent: 17 Rows_examined: 7165101
SET timestamp=1450531602;
SELECT *,
o.date_added AS date,
YEAR(o.date_added) AS year,
QUARTER(o.date_added) AS quarter,
MONTHNAME(o.date_added) AS month,
MIN(o.date_added) AS date_start,
MAX(o.date_added) AS date_end,
op.order_id,
op.product_id,
op.order_product_id, (SELECT p.image FROM `oc_product` p WHERE op.product_id = p.product_id) AS image,
(SELECT p.sku FROM `oc_product` p WHERE op.product_id = p.product_id) AS sku,
op.name,
op.model,
(SELECT cd.name FROM `oc_category_description` cd, `oc_category` c, `oc_product_to_category` p2c WHERE c.category_id = cd.category_id AND p2c.category_id = c.category_id AND cd.language_id = '1' AND op.product_id = p2c.product_id GROUP BY op.product_id) AS category,
(SELECT GROUP_CONCAT(cd.name SEPARATOR ', ') FROM `oc_category_description` cd, `oc_category` c, `oc_product_to_category` p2c WHERE op.product_id = p2c.product_id AND p2c.category_id = c.category_id AND (c.category_id = cd.category_id OR c.parent_id = cd.category_id) AND cd.language_id = '1' AND c.status > 0) AS categories,
(SELECT p.manufacturer_id FROM `oc_product` p WHERE op.product_id = p.product_id) AS manufacturer_id,
(SELECT m.name FROM `oc_manufacturer` m, `oc_product` p WHERE p.manufacturer_id = m.manufacturer_id AND op.product_id = p.product_id) AS manufacturer,
(SELECT GROUP_CONCAT(CONCAT(agd.name,' > ',ad.name,' > ',pa.text) ORDER BY agd.name, ad.name, pa.text ASC SEPARATOR '<br>') FROM `oc_product_attribute` pa, `oc_attribute_description` ad, `oc_attribute` a, `oc_attribute_group_description` agd WHERE pa.language_id = '1' AND pa.product_id = op.product_id AND pa.attribute_id = ad.attribute_id AND ad.language_id = '1' AND ad.attribute_id = a.attribute_id AND a.attribute_group_id = agd.attribute_group_id AND agd.language_id = '1') AS attribute,
(SELECT p.status FROM `oc_product` p WHERE op.product_id = p.product_id) AS status,
(SELECT p.location FROM `oc_product` p WHERE op.product_id = p.product_id) AS location,
(SELECT tc.title FROM `oc_tax_class` tc, `oc_product` p WHERE op.product_id = p.product_id AND p.tax_class_id = tc.tax_class_id) AS tax_class,
(SELECT p.price FROM `oc_product` p WHERE op.product_id = p.product_id) AS prod_price,
(SELECT p.cost FROM `oc_product` p WHERE op.product_id = p.product_id) AS prod_cost,
(SELECT p.viewed FROM `oc_product` p WHERE op.product_id = p.product_id) AS viewed,
(SELECT p.quantity FROM `oc_product` p WHERE op.product_id = p.product_id) AS stock_quantity, '' AS stock_oquantity,
'' AS options, SUM(op.quantity) AS sold_quantity,
op.price,
SUM(op.total) AS total_excl_vat,
SUM(op.tax*op.quantity) AS total_tax,
SUM(op.total+(op.tax*op.quantity)) AS total_incl_vat,
SUM((op.base_price-op.price)*op.quantity) AS discount,
SUM((SELECT SUM(op.price*r.quantity) FROM `oc_return` r WHERE r.product_id = op.product_id AND r.order_id = op.order_id AND r.return_action_id = '' GROUP BY r.product_id)) AS refunds,
SUM((SELECT op.reward FROM `oc_order` o WHERE op.order_id = o.order_id AND o.customer_id > 0)) AS reward_points,
SUM(op.total) AS total_sales,
SUM((op.cost*op.quantity)) AS total_costs,
SUM(((op.total)-((op.cost*op.quantity)))) AS total_profit FROM `oc_order` o INNER JOIN `oc_order_product` op ON (o.order_id = op.order_id) WHERE (DATE(o.date_added) < '2015-09-19') AND o.order_status_id > 0 AND op.product_id IN (SELECT op.product_id FROM `oc_order` o, `oc_order_product` op WHERE o.order_status_id > 0 AND o.order_id = op.order_id AND (DATE(o.date_added) >= '1970-01-01') AND (DATE(o.date_added) < '2015-09-19')) AND op.product_id NOT IN (SELECT op.product_id FROM `oc_order` o, `oc_order_product` op WHERE o.order_status_id > 0 AND o.order_id = op.order_id AND (DATE(o.date_added) >= '2015-09-19')) GROUP BY op.product_id ORDER BY sold_quantity DESC, total_sales DESC LIMIT 0,25;
------------------------------------------------------------------------------------------------------------

You can see that the Query_time is 124 seconds, is examining 7165101 rows and returning only 17 lines every time you run the module.

The wait_timeout set for your MySQL server is 28800 second which is 8 hours:

mysql> show variables like '%_timeout';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| connect_timeout | 10 |
| delayed_insert_timeout | 300 |
| innodb_lock_wait_timeout | 50 |
| innodb_rollback_on_timeout | OFF |
| interactive_timeout | 28800 |
| net_read_timeout | 28800 |
| net_write_timeout | 28800 |
| slave_net_timeout | 3600 |
| table_lock_wait_timeout | 50 |
| wait_timeout | 28800 |


You can see from the results below how much sort_scans are performed during the query execution:

mysql> SHOW STATUS LIKE 'Sort_%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Sort_merge_passes | 0 |
| Sort_range | 0 |
| Sort_rows | 213 |
| Sort_scan | 6136 |
+-------------------+-------+

In other words this is very badly designed MySQL query which will not be able to get executed on your server.

 

I am surprised it runs as quickly as it does given all those dependent subqueries. For every record read you perform 15+ subqueries.

 

Get rid of those subqueries and use JOINS.

I am surprised it runs as quickly as it does given all those dependent subqueries. For every record read you perform 15+ subqueries.

 

Get rid of those subqueries and use JOINS.

Right that makes sense, the sever we use is a dedicated server so I was shocked to see the error.

 

I will contact the vendor to state he should use joins not sub queries. 

 

What does he mean by 

 

 

 

You can see from the results below how much sort_scans are performed during the query execution

 

What is this and what affect does it have?

The Sort Scan is when you load a (hopefully) indexed table into memory. it will sort that table by the relevant columns that are indexed in the table - or MySQL will take it's best guess of it's not indexed.  Each time you have a sub query requesting a different subset of data from the table the system will reload the relevant subsection of the original table and reorder it to suit that request...It's one of the biggest problems with this type of query, as you can appreciate.  Using joins allows MySQL to load in all columns and indexes as a mapped entity and load in aggregate subqueries as associative branches of it, rather than having to build each subqueries dataset independently and then append the results into temp tables for use in the overall query.  Another issue is that each of your SELECT subqueries are nested within an overall SELECT * which is just about as bonkers as the fact that (at a glance) 10 of the subqueries are against the same table.

 

Whoever wrote that needs not to get payed.

Thank everyone for you input. 

 

 

The Sort Scan is when you load a (hopefully) indexed table into memory. it will sort that table by the relevant columns that are indexed in the table - or MySQL will take it's best guess of it's not indexed.  Each time you have a sub query requesting a different subset of data from the table the system will reload the relevant subsection of the original table and reorder it to suit that request...It's one of the biggest problems with this type of query, as you can appreciate.  Using joins allows MySQL to load in all columns and indexes as a mapped entity and load in aggregate subqueries as associative branches of it, rather than having to build each subqueries dataset independently and then append the results into temp tables for use in the overall query.  Another issue is that each of your SELECT subqueries are nested within an overall SELECT * which is just about as bonkers as the fact that (at a glance) 10 of the subqueries are against the same table.

 

Whoever wrote that needs not to get payed.


Unfortunately it has been paid for, fortunately it was an opencart extensive that wasn't very expensive.

 

 

 

 

 Another issue is that each of your SELECT subqueries are nested within an overall SELECT * which is just about as bonkers as the fact that (at a glance) 10 of the subqueries are against the same table.

 

Why is this so bad? 

 

Is there any topics or resource you could recommend that I read to understand how to create better SQL queries? 

 

I'm new to SQL and I'm willing to learn what ever I need to, to make sure that all future SQL queries I make do not include such practices. 

Using SELECT * is like letting go of the steering wheel while driving - it's not an issue while the road is straight, but sooner or later there will be a corner, it's all about control.  Keep control at all times, always explicitly list columns.

Also, by listing columns you allow yourself to alias them in the SELECT.  This means that, regardless of the table structure, you can pick and choose the titles of the columns as they will appear in the query output.

This goes hand in hand with the fact that listing the columns in the SELECT makes it much easier to establish exactly what the query is bringing back without having to look through the whole FROM and WHERE sections.

 

It used to be that SELECT * did funny things to index performance (like ignoring there were any indexes at all) as well, but I expect that's all been sorted now (It's been a while since I used SELECT * for anything other than debugging).

 

NOTE: Always make aliases relevant.  It's all good and well having single letters for tables and all, but when you look back at the code in the future to change or add to it, it makes it far more difficult to establish what's going on if you use "customer_current_active AS c" rather than "customer_current_active AS cust"

 

As for the running of 10 sub-queries against the same table individually: they are all hitting the same dataset, so could be run as a single query.  Properly you would JOIN the tables, as Barand said, and only use subqueries for either pulling out a limited dataset from a huge table to apply the JOIN to instead, or to run any aggregate functions against the table that need to be pre-established to work with the final dataset.

 

There are plenty of resources online if you search for "writing efficient SQL queries" and the like.  There's also likely to be a couple of good books on the matter too, and - if you're feeling flush - Oracle offer expensive training courses on DBA.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.