jworisek Posted February 21, 2006 Share Posted February 21, 2006 I have to admit that I don't have extensive experience with multi table queries, so I thought some of you guys might be able to give me some tips here.Im trying to generate a report of orders we have that still have line items waiting to be shipped by the date we have committed to the customer. Incoming line items are stored in their own table. Shipped items are stored in another table and we almost always do multiple shipments. The problem that I have is that the first query is looking for the order_id #s that I need to find these sums and determine if there is any remaining.Ideally I would like to do it in just one query, but Im not sure how to do this without having an if/else query after the result is returned to determine if the remaining items are zero or not.Im using MySQL 4.1.5 on XP. Quote Link to comment Share on other sites More sharing options...
wickning1 Posted February 21, 2006 Share Posted February 21, 2006 It's hard to reply without knowing a little more about your setup, but it sounds like a fairly simple join to do in one query.This will give you the equivalent of select * from orders but limited to orders with outstanding line items:[code]SELECT DISTINCT o.* FROM orders o, outstandinglineitems i WHERE o.orderno=i.orderno[/code] Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 21, 2006 Author Share Posted February 21, 2006 I think you misunderstood what I was trying to say... Heres basically what the tables in question look like (again, somewhat simplified)incoming_data:order_id int, line_items_received int, line_info_id intShipping Data:order_id int, shipping_num tinyint, line_items_shipped int, line_info_id int, commit_date dateOrdersorder_id int, customer_id tinyint, order_number int, etc...there is no outstanding line items table, the outstanding line items are generated by summing the incoming and substracting the outgoing.I can use something like this:[code]SELECT sum(line_tems_received)-sum(line_items_shipped) as remaining_items,... from orders O, incoming_data ID, shipping_data SD where O.commit_date>=subdate('2006-02-21',interval 14 DAY) AND O.order_id=ID.order_id AND O.order_id=SD.order_id group by order_id[/code]But I want to have the query determine if there are items remaining before returning data from the database. Quote Link to comment Share on other sites More sharing options...
shoz Posted February 21, 2006 Share Posted February 21, 2006 Assuming the rest of the query is fine, try adding "HAVING remaining_items > 0" to the end. If you can be sure that the database will be consistent with received items being always larger than shipped items, you can remove the "> 0".[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--]...group by order_id HAVING remaining_items [color=orange]>[/color] 0[!--sql2--][/div][!--sql3--] Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 21, 2006 Author Share Posted February 21, 2006 That just may work... the only other thing that I just now thought of is will the whole query fail if there is no entry in the shipping_data table yet for that order_id? I would test it out now but I'm on my way out the door. Quote Link to comment Share on other sites More sharing options...
shoz Posted February 22, 2006 Share Posted February 22, 2006 [quote]the only other thing that I just now thought of is will the whole query fail if there is no entry in the shipping_data table yet for that order_id? [/quote][!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--][span style=\'color:blue;font-weight:bold\']SELECT[/span][color=blue]sum[/color](line_items_received)[color=orange]-[/color][color=blue]sum[/color](IFNULL(line_items_shipped, 0)) [color=green]as[/color] remaining_items[color=green]FROM[/color] [color=orange]orders[/color] OINNER [color=green]JOIN[/color] incoming_data IDONID.order_id [color=orange]=[/color] O.order_id[color=green]LEFT[/color] [color=green]JOIN[/color] shipping_data SDONSD.order_id [color=orange]=[/color] O.order_id[color=green]WHERE[/color] O.commit_date[color=orange]>[/color][color=orange]=[/color]subdate([color=red]'2006[span style=\'color:orange\']-[/color]02[color=orange]-[/color]21'[/span],interval 14 DAY)GROUP BYorder_idHAVINGremaining_items [color=orange]>[/color] 0 [!--sql2--][/div][!--sql3--]I assume that orders without "incomming_data" should not be considered. While, orders without "shipping_data" should. Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 22, 2006 Author Share Posted February 22, 2006 well, I did get that query to run ( Ihaven't verified if it is accurately pulling the right rows yet though) but the problem I have is that it is incredibly slow. The first time its run it will time out (30seconds) and then next time it will be instantaneous (I'm assuming most of it was cached). Is there some option that I need to increase in my my.ini file to add more memory? I suppose it could be tables joined in a poor way (performance wise). I know that you can run an explain select query from command line, but on windows I can't get the entire query in before it cuts off. Is there a way to echo back the results of the explain select using PHP or another way to do it from command line?The exact query I am using is posted below... I had simplified it greatly hoping to not have to write an essay on all that was involved ;)[code]$sql="SELECT O.order_id, sum(required_pieces)-sum(IFNULL(good_pieces+chipped_pieces+reject_pieces,0)) as pcs_remain from committed_new CN INNER JOIN orders O ON (O.order_id=CN.order_id) INNER JOIN line_info LI ON (LI.order_id=O.order_id) INNER JOIN incoming_data ID ON (ID.line_info_id=LI.line_info_id) LEFT JOIN shipping S ON (S.order_id=O.order_id) LEFT JOIN shipping_data SD ON (SD.shipping_id=S.shipping_id) where CN.current_commit_date>='2006-02-20' group by order_id having pcs_remain>0";[/code]Committed_new contains the date I am checking for (indexed)Line_info contains specific data explaining the line itemsShipping contains info on the non-line item specific shipping infoEverything except the shipping and shipping_data tables are guaranteed to have data in them. Quote Link to comment Share on other sites More sharing options...
shoz Posted February 22, 2006 Share Posted February 22, 2006 First, I'm guessing to some extent how the tables are related. I'd be better able to give an accurate query(seperate from optimizing it), if I had a SMALL dump of an example set of tables or the one you're using. Someone else may be fine with what was posted previously, but I'd like to have something to look at.The relation between the tables from what one of your previous posts suggests, may mean that the query needs to be changed. Feel free to make up data.My concern is that the recv values and or the shipped values will be repeated on the join making the SUM() inaccurate. If you're sure that the original SUM query you posted gives an accurate result, you shouldn't have to worry however.As for the speed. You can first try changing the SUM part to the following.[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--][color=blue]sum[/color](required_pieces)[color=orange]-[/color]IFNULL([color=blue]sum[/color](good_pieces[color=orange]+[/color]chipped_pieces[color=orange]+[/color]reject_pieces), 0) [color=green]as[/color] pcs_rema[color=green]in[/color] [!--sql2--][/div][!--sql3--]Since someone mentions problems with IFNULL on mysql.com, you can also try[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--][color=blue]sum[/color](required_pieces)[color=orange]-[/color]IF(ISNULL([color=blue]sum[/color](good_pieces[color=orange]+[/color]chipped_pieces[color=orange]+[/color]reject_pieces)), 0, [color=blue]sum[/color](good_pieces[color=orange]+[/color]chipped_pieces[color=orange]+[/color]reject_pieces)) [color=green]as[/color] pcs_rema[color=green]in[/color] [!--sql2--][/div][!--sql3--]or[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--][color=blue]sum[/color](required_pieces)[color=orange]-[/color]IF(NULL [color=orange]<[/color][color=orange]=[/color][color=orange]>[/color] [color=blue]SUM[/color](good_pieces[color=orange]+[/color]chipped_pieces[color=orange]+[/color]reject_pieces),0,[color=blue]SUM[/color](good_pieces[color=orange]+[/color]chipped_pieces[color=orange]+[/color]reject_pieces)) [color=green]as[/color] pcs_rema[color=green]in[/color] [!--sql2--][/div][!--sql3--]I'd then make sure all order_id columns are indexed in each table to see if that helps. I assume the other id's in the query are the primary keys or are indexed in some way and you've said that the "current_commit_date" is already indexed.If none of these work try removing the SUM part of the query in it's entirety to see if the query is still slow. Then you can determine if it's the SUM causing the problem or some other part of the join.To get the result of the EXPLAIN using PHP the following should work[code]$query = "EXPLAIN SELECT etc";$result = mysql_query($query) or die(mysql_error()."\n".$query);while ($row = mysql_fetch_assoc($result)){ foreach ($row as $key => $value) { print $key.': '.$value."<br />\n"; }}[/code] Quote Link to comment Share on other sites More sharing options...
shoz Posted February 22, 2006 Share Posted February 22, 2006 This is a basic example of the query that I'm thinking may be better btw.[code]SELECTt1.order_id, (t1.sum_recv - IF(ISNULL(t2. sum_shipped), 0, t2.sum_shipped)) AS outstandingFROM ( SELECT order_id, SUM(recv) AS sum_recv FROM recv_table GROUP BY order_id ) AS t1LEFT JOIN ( SELECT order_id, SUM(shipped) as sum_shipped FROM shipped_table GROUP BY order_id ) AS t2ON t1.order_id = t2.order_idWHEREt1.sum_recv -t2. sum_shipped > 0[/code] Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 22, 2006 Author Share Posted February 22, 2006 First off, thanks for all the help on this. Unfortunately I might not be able to get to this until tomorrow morning...One thing that I was thinking that might also be slowing it down some is having to check the shipped table when shipments have not been made for that order... Or maybe not. Just a thought.Do you know any good references for writing and optimizing multi-table queries? Most of my work before this has been with single tables with many columns (based on more of a tree structure of files with each customer having their own DB and all of our products/services as tables. Now I've moved it to a relational table system and am starting to get into queries that involve possibly 10 tables and I really don't have experience in getting them to run fast. I'll try that explain select sample code out though, that could be a big help! Quote Link to comment Share on other sites More sharing options...
shoz Posted February 23, 2006 Share Posted February 23, 2006 [quote]Do you know any good references for writing and optimizing multi-table queries?[/quote][a href=\"http://dev.mysql.com/doc/refman/4.1/en/query-speed.html\" target=\"_blank\"]http://dev.mysql.com/doc/refman/4.1/en/query-speed.html[/a][a href=\"http://hackmysql.com/case5\" target=\"_blank\"]http://hackmysql.com/case5[/a] Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 23, 2006 Author Share Posted February 23, 2006 One large problem was that I was missing an important index in incoming_data (its was checking over 34,000 rows just from that one table). Now that I added a couple more indexes, even my old script with a bunch of seperate queries runs in under a second. This was one of those things that I never would have realized without being able to check the explain selects through php.Also, those links that you provided are just what I was looking for. I just needed to see some examples with explanation to get a feel for it. Again, thanks for the help! :)One last quick question though...I have noticed that the "committed_new" table is still searching every single row and using a temporary table. I get this when I do the explain:id:1 select_type: SIMPLEtable: CNtype: ALLpossible_keys: current_commit_datekey:key:len:ref:rows 669Extra: Using where; Using Temporary; Using filesortwith the following query[code]SELECT O.order_id, sum(required_pieces)-IF(ISNULL(sum(good_pieces+chipped_pieces+reject_pieces)),0,sum(good_pieces+chipped_pieces+reject_pieces)) as pcs_remain from committed_new CN USE INDEX (current_commit_date) INNER JOIN orders O ON (O.order_id=CN.order_id) INNER JOIN line_info LI ON (LI.order_id=O.order_id) INNER JOIN incoming_data ID ON (ID.line_info_id=LI.line_info_id) LEFT JOIN shipping S ON (S.order_id=O.order_id) LEFT JOIN shipping_data SD ON (SD.shipping_id=S.shipping_id) where CN.current_commit_date>=DATE_SUB(curdate(), INTERVAL 14 DAY) group by order_id having pcs_remain>0[/code]Another problem is that it sums up the incoming multiple times... if there has been 3 shipments then the incoming is summed up 3 times... but it runs fast ;) I'll give that last query a check with the subselects to pull the sums out. Quote Link to comment Share on other sites More sharing options...
shoz Posted February 23, 2006 Share Posted February 23, 2006 [quote]I have noticed that the "committed_new" table is still searching every single row and using a temporary table.[/quote]See if adding the index (current_commit_date,order_id) helps. You might also try (order_id, current_commit_date).Btw, this is the example query I posted earlier, rewritten based on the actual tables you show in your query. Which I think may be what you should use as mentioned previously. You can check the accuracy.[code]SELECTCN.order_id,sum_required_pieces-IF(ISNULL(sum_other_pieces), 0 sum_other_pieces) AS pcs_remainFROMcommitted_new AS CNINNER JOIN ( SELECT LI.order_id, order_id, SUM(required_pieces) AS sum_required_pieces FROM line_info AS LI INNER JOIN incoming_data AS ID ON LI.line_info_id=ID.line_info_id GROUP BY LI.order_id ) AS recv_tblON CN.order_id=recv_tbl.order_idLEFT JOIN ( SELECT S.order_id, SUM(good_pieces+chipped_pieces+reject_pieces) AS sum_other_pieces FROM shipping S INNER JOIN shipping_data SD ON S.shipping_id=SD.shipping_id GROUP BY S.order_id ) AS ship_tblON recv_tbl.order_id=ship_tbl.order_idWHERECN.current_commit_date>=DATE_SUB(curdate(), INTERVAL 14 DAY)GROUP BYCN.order_idHAVING pcs_remain>0[/code]EDIT:It may be faster to have the committed_new table as part of the subselects. Rather than outside of them.So you'd have something like[code]SELECT recv_tbl.order_id, sum_etc - sum_etc ( SELECT LI.order_id, order_id, SUM(required_pieces) AS sum_required_pieces FROM committed_new AS CN INNER JOIN line_info AS LI ON CN.order_id=LI.order_id INNER JOIN incoming_data AS ID ON LI.line_info_id=ID.line_info_id WHERE CN.date >= etc GROUP BY LI.order_id ) AS recv_tbl LEFT JOIN ( SELECT S.order_id FROM committed_new etc etc )[/code] Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 23, 2006 Author Share Posted February 23, 2006 I tried the top one and it fixes the problem with checking the entire table for committed_new, unfortunately it doesnt use keys for the incoming_data or shipping_data tables so it scans the entire tables and uses temporary tables.One quick fix that I can do is to divide the incoming sum by the number of shipments to account for the additional summations. Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 23, 2006 Author Share Posted February 23, 2006 I think I might just take the easy route and do the incoming sum in the main query and then have a query after that to look up the information for the shipped sum. Just adding those indexes alone has sped up the time it takes to print each line of the report by 40xor more so there really won't be too much performance hit by breaking up the query. My benchmark report just ran in .646 where it was over 20 seconds before. Quote Link to comment Share on other sites More sharing options...
shoz Posted February 23, 2006 Share Posted February 23, 2006 Try the second query which uses the commited_new table in the subselects. The reasoning behind it is that when the SUBSELECT does the SUMs as you've seen, it does the SUM and GROUP BY for the entire table. While, if you have the commited_new table in the SUBSELECTs with the WHERE clause to find the order_ids in the CN table to JOIN ON, you only have to SUM for those rows/ids.Remember to add the (commited_date,order_id) index.[code]SELECTrecv_tbl.order_id, sum_required_pieces-IF(ISNULL(sum_other_pieces), 0 sum_other_pieces) AS pcs_remainFROM ( SELECT CN.order_id, SUM(required_pieces) AS sum_required_pieces FROM committed_new AS CN INNER JOIN line_info AS LI ON CN.order_id=ID.order_id INNER JOIN incoming_data AS ID ON LI.line_info_id=ID.line_info_id WHERE CN.current_commit_date>=DATE_SUB(curdate(), INTERVAL 14 DAY) GROUP BY CN.order_id ) AS recv_tblLEFT JOIN ( SELECT CN.order_id, SUM(good_pieces+chipped_pieces+reject_pieces) AS sum_other_pieces FROM committed_new AS CN INNER JOIN shipping S ON CN.order_id = S.order_id INNER JOIN shipping_data SD ON S.shipping_id=SD.shipping_id WHERE CN.current_commit_date>=DATE_SUB(curdate(), INTERVAL 14 DAY) GROUP BY CN.order_id ) AS ship_tblON recv_tbl.order_id=ship_tbl.order_idWHERE sum_required_pieces-IF(ISNULL(sum_other_pieces), 0 sum_other_pieces) > 0[/code]EDIT: Changed the query. Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 23, 2006 Author Share Posted February 23, 2006 I updated to try that latest query and at least with preliminary checks it seems to be working correctly...It still is trying to call the entire committed_new table, only now it does it once for each subselect. heres the explain select on it:[code]id: 1select_type: PRIMARYtable: type: ALLpossible_keys: key: key_len: ref: rows: 695Extra: id: 1select_type: PRIMARYtable: type: ALLpossible_keys: key: key_len: ref: rows: 382Extra: id: 3select_type: DERIVEDtable: CNtype: ALLpossible_keys: key: key_len: ref: rows: 704Extra: Using temporary; Using filesortid: 3select_type: DERIVEDtable: Stype: refpossible_keys: PRIMARY,order_idkey: order_idkey_len: 4ref: CN.order_idrows: 2Extra: id: 3select_type: DERIVEDtable: SDtype: refpossible_keys: shipping_idkey: shipping_idkey_len: 4ref: S.shipping_idrows: 3Extra: id: 2select_type: DERIVEDtable: CNtype: ALLpossible_keys: current_commit_datekey: key_len: ref: rows: 704Extra: Using where; Using temporary; Using filesortid: 2select_type: DERIVEDtable: LItype: refpossible_keys: PRIMARY,line_infokey: PRIMARYkey_len: 4ref: CN.order_idrows: 3Extra: id: 2select_type: DERIVEDtable: IDtype: refpossible_keys: line_info_idkey: line_info_idkey_len: 4ref: LI.line_info_idrows: 1Extra: [/code] Quote Link to comment Share on other sites More sharing options...
shoz Posted February 23, 2006 Share Posted February 23, 2006 Post the create statement for the table "committed_new" and the indexes on the table. [!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--]SHOW CREATE TABLE committed_new;SHOW INDEX [color=green]FROM[/color] [color=orange]committed_new;[/color] [!--sql2--][/div][!--sql3--]Seeing how the first query returns one value for each order_id, you should be able to do away with the second SUBSELECT and do the JOIN directly on the shipping table without problems. It may be faster.You shouldnt notice any difference in the results from this query versus the previous one.[code]SELECTrecv_tbl.order_id, sum_required_pieces-IF(ISNULL(SUM(SD.good_pieces+SD.chipped_pieces+SD.reject_pieces)),0,sum(SD.good_pieces+SD.chipped_pieces+SD.reject_pieces)) AS pcs_remainFROM ( SELECT CN.order_id, SUM(required_pieces) AS sum_required_pieces FROM committed_new AS CN INNER JOIN line_info AS LI ON CN.order_id=ID.order_id INNER JOIN incoming_data AS ID ON LI.line_info_id=ID.line_info_id WHERE CN.current_commit_date>=DATE_SUB(curdate(), INTERVAL 14 DAY) GROUP BY CN.order_id ) AS recv_tblLEFT JOINshipping SON recv_tbl.order_id = S.order_idINNER JOINshipping_data SDONS.shipping_id=SD.shipping_idGROUP BYrecv_tbl.order_idHAVINGpcs_remain > 0[/code] Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 23, 2006 Author Share Posted February 23, 2006 [code] | committed_new | CREATE TABLE `committed_new` ( `committed_id` int(11) NOT NULL auto_increment, `order_id` int(11) NOT NULL default '0', `line_info_id` int(11) NOT NULL default '0', `initial_commit_date` date NOT NULL default '0000-00-00', `current_commit_date` date NOT NULL default '0000-00-00', `comments` varchar(200) NOT NULL default '', `history` text NOT NULL, PRIMARY KEY (`committed_id`), UNIQUE KEY `order_id` (`order_id`,`line_info_id`), KEY `line_info_id` (`line_info_id`), KEY `current_commit_date` (`current_commit_date`)) ENGINE=MyISAM DEFAULT CHARSET=latin1 |[/code][code]+---------------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |+---------------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+| committed_new | 0 | PRIMARY | 1 | committed_id | A | 704 |NULL | NULL | | BTREE | || committed_new | 0 | order_id | 1 | order_id | A | NULL |NULL | NULL | | BTREE | || committed_new | 0 | order_id | 2 | line_info_id | A | 704 |NULL | NULL | | BTREE | || committed_new | 1 | line_info_id | 1 | line_info_id | A | 10 |NULL | NULL | | BTREE | || committed_new | 1 | current_commit_date | 1 | current_commit_date | A | 39 |NULL | NULL | | BTREE | |+---------------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+[/code] Quote Link to comment Share on other sites More sharing options...
shoz Posted February 23, 2006 Share Posted February 23, 2006 Try[!--sql--][div class=\'sqltop\']SQL[/div][div class=\'sqlmain\'][!--sql1--]ALTER TALE committed_new ADD INDEX date_order_id (current_commit_date, order_id);[!--sql2--][/div][!--sql3--] Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 23, 2006 Author Share Posted February 23, 2006 I made that update and it is still checking all the rows in the table... Quote Link to comment Share on other sites More sharing options...
shoz Posted February 23, 2006 Share Posted February 23, 2006 MYSQL may have decided that it's faster not to use the index in this case[a href=\"http://dev.mysql.com/doc/refman/4.1/en/mysql-indexes.html\" target=\"_blank\"]http://dev.mysql.com/doc/refman/4.1/en/mysql-indexes.html[/a]That could change once more rows are in the table. If you try a different date range you also may see a change in whether or not MYSQL uses the index.Someone more knowledgeable may have more insight. Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 23, 2006 Author Share Posted February 23, 2006 I did try another range and it seemed to stay pretty much the same... If this becomes an issue I will most likely just prune the table since we don't really need to access the information after the order has been shipped. Quote Link to comment Share on other sites More sharing options...
fenway Posted February 23, 2006 Share Posted February 23, 2006 Have to tried running ANALZYE TABLE on the table in question? MySQL assumes an even key distribution for indexing, which is rarely the case. Quote Link to comment Share on other sites More sharing options...
jworisek Posted February 27, 2006 Author Share Posted February 27, 2006 Well I had to change a few things up, but I did get it working.It did bring up another question (since Im looking at optimizing reports that used to use multiple queries), when I want to apply billing information for line item, there are 5 different structures of billing set up in an order of precedence (ex standard billing is the lowest and specific order billing is the highest).Can I use a query like below to pull out the highest precedent price for billing?[code]$sql="SELECT price FROM bill_type BT LEFT JOIN billing_low BL ON (BL.bill_type_id=BT.bill_type_id) LEFT JOIN bill_medium BM ON (BM.bill_type_id=BT.bill_type_id) LEFT JOIN bill_high BH ON (BH.bill_type_id=BT.bill_type_id) where ..... "[/code] Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.