Jump to content


Photo

optimizing a query


  • Please log in to reply
34 replies to this topic

#1 jworisek

jworisek
  • Members
  • PipPipPip
  • Advanced Member
  • 112 posts

Posted 21 February 2006 - 09:45 PM

I have to admit that I don't have extensive experience with multi table queries, so I thought some of you guys might be able to give me some tips here.

Im trying to generate a report of orders we have that still have line items waiting to be shipped by the date we have committed to the customer. Incoming line items are stored in their own table. Shipped items are stored in another table and we almost always do multiple shipments. The problem that I have is that the first query is looking for the order_id #s that I need to find these sums and determine if there is any remaining.

Ideally I would like to do it in just one query, but Im not sure how to do this without having an if/else query after the result is returned to determine if the remaining items are zero or not.

Im using MySQL 4.1.5 on XP.

#2 wickning1

wickning1
  • Members
  • PipPipPip
  • Advanced Member
  • 405 posts

Posted 21 February 2006 - 10:01 PM

It's hard to reply without knowing a little more about your setup, but it sounds like a fairly simple join to do in one query.

This will give you the equivalent of select * from orders but limited to orders with outstanding line items:
SELECT DISTINCT o.* FROM orders o, outstandinglineitems i WHERE o.orderno=i.orderno


#3 jworisek

jworisek
  • Members
  • PipPipPip
  • Advanced Member
  • 112 posts

Posted 21 February 2006 - 10:18 PM

I think you misunderstood what I was trying to say... Heres basically what the tables in question look like (again, somewhat simplified)

incoming_data:
order_id int, line_items_received int, line_info_id int

Shipping Data:
order_id int, shipping_num tinyint, line_items_shipped int, line_info_id int, commit_date date

Orders
order_id int, customer_id tinyint, order_number int, etc...

there is no outstanding line items table, the outstanding line items are generated by summing the incoming and substracting the outgoing.

I can use something like this:

SELECT sum(line_tems_received)-sum(line_items_shipped) as remaining_items,... from orders O, incoming_data ID, shipping_data SD where O.commit_date>=subdate('2006-02-21',interval 14 DAY) AND O.order_id=ID.order_id AND O.order_id=SD.order_id group by order_id

But I want to have the query determine if there are items remaining before returning data from the database.

#4 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 21 February 2006 - 11:12 PM

Assuming the rest of the query is fine, try adding "HAVING remaining_items > 0" to the end. If you can be sure that the database will be consistent with received items being always larger than shipped items, you can remove the "> 0".
[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--]...group by order_id HAVING remaining_items > 0
[!--sql2--][/div][!--sql3--]

#5 jworisek

jworisek
  • Members
  • PipPipPip
  • Advanced Member
  • 112 posts

Posted 21 February 2006 - 11:23 PM

That just may work... the only other thing that I just now thought of is will the whole query fail if there is no entry in the shipping_data table yet for that order_id?

I would test it out now but I'm on my way out the door.

#6 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 22 February 2006 - 12:35 AM

the only other thing that I just now thought of is will the whole query fail if there is no entry in the shipping_data table yet for that order_id?

[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--][span style=\'color:blue;font-weight:bold\']SELECT[/span]
sum(line_items_received)-sum(IFNULL(line_items_shipped, 0)) as remaining_items
FROM orders O
INNER JOIN incoming_data ID
ON
ID.order_id = O.order_id
LEFT JOIN shipping_data SD
ON
SD.order_id = O.order_id
WHERE O.commit_date>=subdate('2006[span style=\'color:orange\']-02-21'[/span],interval 14 DAY)
GROUP BY
order_id
HAVING
remaining_items > 0 [!--sql2--][/div][!--sql3--]
I assume that orders without "incomming_data" should not be considered. While, orders without "shipping_data" should.

#7 jworisek

jworisek
  • Members
  • PipPipPip
  • Advanced Member
  • 112 posts

Posted 22 February 2006 - 03:49 PM

well, I did get that query to run ( Ihaven't verified if it is accurately pulling the right rows yet though) but the problem I have is that it is incredibly slow. The first time its run it will time out (30seconds) and then next time it will be instantaneous (I'm assuming most of it was cached). Is there some option that I need to increase in my my.ini file to add more memory? I suppose it could be tables joined in a poor way (performance wise). I know that you can run an explain select query from command line, but on windows I can't get the entire query in before it cuts off. Is there a way to echo back the results of the explain select using PHP or another way to do it from command line?

The exact query I am using is posted below... I had simplified it greatly hoping to not have to write an essay on all that was involved ;)

$sql="SELECT O.order_id, sum(required_pieces)-sum(IFNULL(good_pieces+chipped_pieces+reject_pieces,0)) as pcs_remain from committed_new CN INNER JOIN orders O ON (O.order_id=CN.order_id) INNER JOIN line_info LI ON (LI.order_id=O.order_id) INNER JOIN incoming_data ID ON (ID.line_info_id=LI.line_info_id) LEFT JOIN shipping S ON (S.order_id=O.order_id) LEFT JOIN shipping_data SD ON (SD.shipping_id=S.shipping_id) where CN.current_commit_date>='2006-02-20' group by order_id having pcs_remain>0";

Committed_new contains the date I am checking for (indexed)
Line_info contains specific data explaining the line items
Shipping contains info on the non-line item specific shipping info

Everything except the shipping and shipping_data tables are guaranteed to have data in them.

#8 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 22 February 2006 - 07:00 PM

First, I'm guessing to some extent how the tables are related. I'd be better able to give an accurate query(seperate from optimizing it), if I had a SMALL dump of an example set of tables or the one you're using. Someone else may be fine with what was posted previously, but I'd like to have something to look at.

The relation between the tables from what one of your previous posts suggests, may mean that the query needs to be changed. Feel free to make up data.

My concern is that the recv values and or the shipped values will be repeated on the join making the SUM() inaccurate. If you're sure that the original SUM query you posted gives an accurate result, you shouldn't have to worry however.

As for the speed. You can first try changing the SUM part to the following.
[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--]sum(required_pieces)-IFNULL(sum(good_pieces+chipped_pieces+reject_pieces), 0) as pcs_remain [!--sql2--][/div][!--sql3--]
Since someone mentions problems with IFNULL on mysql.com, you can also try
[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--]sum(required_pieces)-IF(ISNULL(sum(good_pieces+chipped_pieces+reject_pieces)), 0, sum(good_pieces+chipped_pieces+reject_pieces)) as pcs_remain [!--sql2--][/div][!--sql3--]
or
[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--]sum(required_pieces)-IF(NULL <=> SUM(good_pieces+chipped_pieces+reject_pieces),0,SUM(good_pieces+chipped_pieces+reject_pieces)) as pcs_remain [!--sql2--][/div][!--sql3--]
I'd then make sure all order_id columns are indexed in each table to see if that helps. I assume the other id's in the query are the primary keys or are indexed in some way and you've said that the "current_commit_date" is already indexed.

If none of these work try removing the SUM part of the query in it's entirety to see if the query is still slow. Then you can determine if it's the SUM causing the problem or some other part of the join.

To get the result of the EXPLAIN using PHP the following should work
$query = "EXPLAIN SELECT etc";
$result = mysql_query($query) or die(mysql_error()."\n".$query);
while ($row = mysql_fetch_assoc($result))
{
    foreach ($row as $key => $value)
    {
        print $key.': '.$value."<br />\n";
    }
}


#9 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 22 February 2006 - 07:39 PM

This is a basic example of the query that I'm thinking may be better btw.

SELECT
t1.order_id, (t1.sum_recv - IF(ISNULL(t2. sum_shipped), 0, t2.sum_shipped)) AS outstanding
FROM
    (
    SELECT
    order_id, SUM(recv) AS sum_recv
    FROM 
    recv_table
    GROUP BY order_id
    ) AS t1
LEFT JOIN
    (
    SELECT
    order_id, SUM(shipped) as sum_shipped
    FROM
    shipped_table
    GROUP BY order_id
    ) AS t2
ON t1.order_id = t2.order_id
WHERE
t1.sum_recv -t2. sum_shipped > 0


#10 jworisek

jworisek
  • Members
  • PipPipPip
  • Advanced Member
  • 112 posts

Posted 22 February 2006 - 07:59 PM


First off, thanks for all the help on this. Unfortunately I might not be able to get to this until tomorrow morning...

One thing that I was thinking that might also be slowing it down some is having to check the shipped table when shipments have not been made for that order... Or maybe not. Just a thought.

Do you know any good references for writing and optimizing multi-table queries? Most of my work before this has been with single tables with many columns (based on more of a tree structure of files with each customer having their own DB and all of our products/services as tables. Now I've moved it to a relational table system and am starting to get into queries that involve possibly 10 tables and I really don't have experience in getting them to run fast. I'll try that explain select sample code out though, that could be a big help!

#11 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 23 February 2006 - 03:06 PM

Do you know any good references for writing and optimizing multi-table queries?

[a href=\"http://dev.mysql.com/doc/refman/4.1/en/query-speed.html\" target=\"_blank\"]http://dev.mysql.com/doc/refman/4.1/en/query-speed.html[/a]
[a href=\"http://hackmysql.com/case5\" target=\"_blank\"]http://hackmysql.com/case5[/a]

#12 jworisek

jworisek
  • Members
  • PipPipPip
  • Advanced Member
  • 112 posts

Posted 23 February 2006 - 04:28 PM

One large problem was that I was missing an important index in incoming_data (its was checking over 34,000 rows just from that one table). Now that I added a couple more indexes, even my old script with a bunch of seperate queries runs in under a second. This was one of those things that I never would have realized without being able to check the explain selects through php.

Also, those links that you provided are just what I was looking for. I just needed to see some examples with explanation to get a feel for it. Again, thanks for the help! :)

One last quick question though...

I have noticed that the "committed_new" table is still searching every single row and using a temporary table. I get this when I do the explain:

id:1
select_type: SIMPLE
table: CN
type: ALL
possible_keys: current_commit_date
key:
key:len:
ref:
rows 669
Extra: Using where; Using Temporary; Using filesort

with the following query

SELECT O.order_id, sum(required_pieces)-IF(ISNULL(sum(good_pieces+chipped_pieces+reject_pieces)),0,sum(good_pieces+chipped_pieces+reject_pieces)) as pcs_remain from committed_new CN USE INDEX (current_commit_date) INNER JOIN orders O ON (O.order_id=CN.order_id) INNER JOIN line_info LI ON (LI.order_id=O.order_id) INNER JOIN incoming_data ID ON (ID.line_info_id=LI.line_info_id) LEFT JOIN shipping S ON (S.order_id=O.order_id) LEFT JOIN shipping_data SD ON (SD.shipping_id=S.shipping_id) where CN.current_commit_date>=DATE_SUB(curdate(), INTERVAL 14 DAY) group by order_id having pcs_remain>0

Another problem is that it sums up the incoming multiple times... if there has been 3 shipments then the incoming is summed up 3 times... but it runs fast ;) I'll give that last query a check with the subselects to pull the sums out.

#13 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 23 February 2006 - 06:22 PM

I have noticed that the "committed_new" table is still searching every single row and using a temporary table.

See if adding the index (current_commit_date,order_id) helps. You might also try (order_id, current_commit_date).

Btw, this is the example query I posted earlier, rewritten based on the actual tables you show in your query. Which I think may be what you should use as mentioned previously. You can check the accuracy.

SELECT
CN.order_id,
sum_required_pieces-IF(ISNULL(sum_other_pieces), 0 sum_other_pieces) AS pcs_remain
FROM
committed_new AS CN
INNER JOIN
    (
    SELECT
    LI.order_id, order_id, SUM(required_pieces) AS sum_required_pieces
    FROM
    line_info AS LI
    INNER JOIN
    incoming_data AS ID
    ON 
    LI.line_info_id=ID.line_info_id
    GROUP BY LI.order_id
    ) AS recv_tbl
ON CN.order_id=recv_tbl.order_id
LEFT JOIN
    (
    SELECT
    S.order_id, SUM(good_pieces+chipped_pieces+reject_pieces) AS sum_other_pieces
    FROM
    shipping S
    INNER JOIN
    shipping_data SD
    ON
    S.shipping_id=SD.shipping_id
    GROUP BY S.order_id
    ) AS ship_tbl
ON recv_tbl.order_id=ship_tbl.order_id
WHERE
CN.current_commit_date>=DATE_SUB(curdate(), INTERVAL 14 DAY)
GROUP BY
CN.order_id
HAVING pcs_remain>0

EDIT:
It may be faster to have the committed_new table as part of the subselects. Rather than outside of them.
So you'd have something like

SELECT recv_tbl.order_id, sum_etc - sum_etc
    (
    SELECT
    LI.order_id, order_id, SUM(required_pieces) AS sum_required_pieces
    FROM
    committed_new AS CN
    INNER JOIN
    line_info AS LI
    ON CN.order_id=LI.order_id
    INNER JOIN
    incoming_data AS ID
    ON 
    LI.line_info_id=ID.line_info_id
        WHERE
        CN.date >= etc
    GROUP BY LI.order_id
    ) AS recv_tbl
    LEFT JOIN
    (
    SELECT S.order_id
    FROM
    committed_new etc
    etc
    )


#14 jworisek

jworisek
  • Members
  • PipPipPip
  • Advanced Member
  • 112 posts

Posted 23 February 2006 - 07:39 PM

I tried the top one and it fixes the problem with checking the entire table for committed_new, unfortunately it doesnt use keys for the incoming_data or shipping_data tables so it scans the entire tables and uses temporary tables.

One quick fix that I can do is to divide the incoming sum by the number of shipments to account for the additional summations.

#15 jworisek

jworisek
  • Members
  • PipPipPip
  • Advanced Member
  • 112 posts

Posted 23 February 2006 - 07:56 PM

I think I might just take the easy route and do the incoming sum in the main query and then have a query after that to look up the information for the shipped sum. Just adding those indexes alone has sped up the time it takes to print each line of the report by 40xor more so there really won't be too much performance hit by breaking up the query. My benchmark report just ran in .646 where it was over 20 seconds before.

#16 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 23 February 2006 - 08:13 PM

Try the second query which uses the commited_new table in the subselects. The reasoning behind it is that when the SUBSELECT does the SUMs as you've seen, it does the SUM and GROUP BY for the entire table. While, if you have the commited_new table in the SUBSELECTs with the WHERE clause to find the order_ids in the CN table to JOIN ON, you only have to SUM for those rows/ids.

Remember to add the (commited_date,order_id) index.
SELECT
recv_tbl.order_id, sum_required_pieces-IF(ISNULL(sum_other_pieces), 0 sum_other_pieces) AS pcs_remain
FROM
    (
    SELECT
    CN.order_id, SUM(required_pieces) AS sum_required_pieces
    FROM
    committed_new AS CN
    INNER JOIN
    line_info AS LI
    ON CN.order_id=ID.order_id
    INNER JOIN
    incoming_data AS ID
    ON
    LI.line_info_id=ID.line_info_id
    WHERE CN.current_commit_date>=DATE_SUB(curdate(), INTERVAL 14 DAY)
    GROUP BY CN.order_id
    ) AS recv_tbl
LEFT JOIN
    (
    SELECT
    CN.order_id, SUM(good_pieces+chipped_pieces+reject_pieces) AS sum_other_pieces
    FROM
    committed_new AS CN
    INNER JOIN
    shipping S
    ON CN.order_id = S.order_id
    INNER JOIN
    shipping_data SD
    ON
    S.shipping_id=SD.shipping_id
    WHERE
    CN.current_commit_date>=DATE_SUB(curdate(), INTERVAL 14 DAY)
    GROUP BY CN.order_id
    ) AS ship_tbl
ON recv_tbl.order_id=ship_tbl.order_id
WHERE sum_required_pieces-IF(ISNULL(sum_other_pieces), 0 sum_other_pieces)  > 0
EDIT: Changed the query.

#17 jworisek

jworisek
  • Members
  • PipPipPip
  • Advanced Member
  • 112 posts

Posted 23 February 2006 - 08:39 PM

I updated to try that latest query and at least with preliminary checks it seems to be working correctly...

It still is trying to call the entire committed_new table, only now it does it once for each subselect. heres the explain select on it:

id: 1
select_type: PRIMARY
table: 
type: ALL
possible_keys: 
key: 
key_len: 
ref: 
rows: 695
Extra: 

id: 1
select_type: PRIMARY
table: 
type: ALL
possible_keys: 
key: 
key_len: 
ref: 
rows: 382
Extra: 

id: 3
select_type: DERIVED
table: CN
type: ALL
possible_keys: 
key: 
key_len: 
ref: 
rows: 704
Extra: Using temporary; Using filesort

id: 3
select_type: DERIVED
table: S
type: ref
possible_keys: PRIMARY,order_id
key: order_id
key_len: 4
ref: CN.order_id
rows: 2
Extra: 

id: 3
select_type: DERIVED
table: SD
type: ref
possible_keys: shipping_id
key: shipping_id
key_len: 4
ref: S.shipping_id
rows: 3
Extra: 

id: 2
select_type: DERIVED
table: CN
type: ALL
possible_keys: current_commit_date
key: 
key_len: 
ref: 
rows: 704
Extra: Using where; Using temporary; Using filesort

id: 2
select_type: DERIVED
table: LI
type: ref
possible_keys: PRIMARY,line_info
key: PRIMARY
key_len: 4
ref: CN.order_id
rows: 3
Extra: 

id: 2
select_type: DERIVED
table: ID
type: ref
possible_keys: line_info_id
key: line_info_id
key_len: 4
ref: LI.line_info_id
rows: 1
Extra:


#18 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 23 February 2006 - 09:22 PM

Post the create statement for the table "committed_new" and the indexes on the table.
[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--]SHOW CREATE TABLE committed_new;
SHOW INDEX FROM committed_new; [!--sql2--][/div][!--sql3--]
Seeing how the first query returns one value for each order_id, you should be able to do away with the second SUBSELECT and do the JOIN directly on the shipping table without problems. It may be faster.You shouldnt notice any difference in the results from this query versus the previous one.
SELECT
recv_tbl.order_id, sum_required_pieces-IF(ISNULL(SUM(SD.good_pieces+SD.chipped_pieces+SD.reject_pieces)),0,sum(SD.good_pieces+SD.chipped_pieces+SD.reject_pieces)) AS pcs_remain
FROM
    (
    SELECT
    CN.order_id, SUM(required_pieces) AS sum_required_pieces
    FROM
    committed_new AS CN
    INNER JOIN
    line_info AS LI
    ON CN.order_id=ID.order_id
    INNER JOIN
    incoming_data AS ID
    ON
    LI.line_info_id=ID.line_info_id
    WHERE CN.current_commit_date>=DATE_SUB(curdate(), INTERVAL 14 DAY)
    GROUP BY CN.order_id
    ) AS recv_tbl
LEFT JOIN
shipping S
ON recv_tbl.order_id = S.order_id
INNER JOIN
shipping_data SD
ON
S.shipping_id=SD.shipping_id
GROUP BY
recv_tbl.order_id
HAVING
pcs_remain > 0


#19 jworisek

jworisek
  • Members
  • PipPipPip
  • Advanced Member
  • 112 posts

Posted 23 February 2006 - 09:32 PM

| committed_new | CREATE TABLE `committed_new` (
  `committed_id` int(11) NOT NULL auto_increment,
  `order_id` int(11) NOT NULL default '0',
  `line_info_id` int(11) NOT NULL default '0',
  `initial_commit_date` date NOT NULL default '0000-00-00',
  `current_commit_date` date NOT NULL default '0000-00-00',
  `comments` varchar(200) NOT NULL default '',
  `history` text NOT NULL,
  PRIMARY KEY  (`committed_id`),
  UNIQUE KEY `order_id` (`order_id`,`line_info_id`),
  KEY `line_info_id` (`line_info_id`),
  KEY `current_commit_date` (`current_commit_date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

+---------------+------------+---------------------+--------------+---------------------+-----------+-------------+-----
-----+--------+------+------------+---------+
| Table         | Non_unique | Key_name            | Seq_in_index | Column_name         | Collation | Cardinality | Sub_
part | Packed | Null | Index_type | Comment |
+---------------+------------+---------------------+--------------+---------------------+-----------+-------------+-----
-----+--------+------+------------+---------+
| committed_new |          0 | PRIMARY             |            1 | committed_id        | A         |         704 |
NULL | NULL   |      | BTREE      |         |
| committed_new |          0 | order_id            |            1 | order_id            | A         |        NULL |
NULL | NULL   |      | BTREE      |         |
| committed_new |          0 | order_id            |            2 | line_info_id        | A         |         704 |
NULL | NULL   |      | BTREE      |         |
| committed_new |          1 | line_info_id        |            1 | line_info_id        | A         |          10 |
NULL | NULL   |      | BTREE      |         |
| committed_new |          1 | current_commit_date |            1 | current_commit_date | A         |          39 |
NULL | NULL   |      | BTREE      |         |
+---------------+------------+---------------------+--------------+---------------------+-----------+-------------+-----
-----+--------+------+------------+---------+


#20 shoz

shoz
  • Staff Alumni
  • Advanced Member
  • 600 posts

Posted 23 February 2006 - 09:42 PM

Try
[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--]ALTER TALE committed_new ADD INDEX date_order_id (current_commit_date, order_id);
[!--sql2--][/div][!--sql3--]




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users