Christian F. Posted February 21, 2013 Share Posted February 21, 2013 (edited) A little background on the project I'm working on: I have a reasonably large number of values, over 500k rows worth. The values are generated per hour, per parent. I want to group these values by one of four time intervals: Month, week, day or hour, and get the average for this period. The challenge comes by the fact that I need to split the values into two different blocks, based upon whether the value was generated during the "day" or "night/weekend". So far I've come up with two different approaches to this: Have two fields in the table for saving the values, one for the "daytime" values and the other for "night time/weekend" values.This would allow me to run separate functions on the fields, without involving the date functions in MySQL. At the cost of having two fields for what is essentially the same type of data, and having to do the splitting in the pre-insert phase. Save all values in the same field, and then use MySQL's datetime functions with a CASE-WHEN to figure out in which block they belong.This approach seems to be the cleanest one in terms of database-design, but I fear it will put a lot of strain on the database. Especially since the SELECT statements will run more often than the insertion script. So the question is if I should go for the first approach, the second, or if there is some other solution to this that I've failed to grasp? I could really do with some expert help on this one. Example values: timestamp parent_1 parent_2 parent_3 2013.02.04 04:00:00 6,300000 1,400000 4,000000 2013.02.04 05:00:00 6,300000 1,400000 4,000000 2013.02.04 06:00:00 6,300000 1,400000 4,000000 2013.02.04 07:00:00 6,300000 1,400000 4,000000 2013.02.04 08:00:00 6,300000 1,400000 4,000000 2013.02.04 09:00:00 6,300000 1,400000 4,000000 2013.02.04 10:00:00 13,600000 9,900000 10,800000 2013.02.04 11:00:00 13,600000 9,900000 10,800000 Desired output: Date: Day avg. Night avg. -- Parent_1 2013.02.04 99,5 6,3 ..... -- Parent_2 2013.02.04 5,65 1,4 ..... -- Parent_3 2013.02.04 4,9 4,0 ..... Edited February 21, 2013 by Christian F. Quote Link to comment Share on other sites More sharing options...
Barand Posted February 21, 2013 Share Posted February 21, 2013 http://forums.phpfreaks.com/topic/273634-best-way-to-set-up-tables-when-multiple-values/?do=findComment&comment=1408360 Quote Link to comment Share on other sites More sharing options...
Christian F. Posted February 21, 2013 Author Share Posted February 21, 2013 (edited) Thank you, Barand, it was just what I needed (I think). I've come to the conclusion that here is indeed a third option, which I missed: Creating a new table, called "data_timeslot", and referencing it in the data table. Then group the data based upon this field, to separate the data properly. Benefits of this, as I see it, is that the integrity of the data is not dependent upon the table structure, I don't have to use datetime calculations to sort each individual row, and it's possible to change/add the timezones without altering the structure of the tables themselves. Thank you again, Barand. Oh, and if I'm still doing it wrong, please feel free to shout at me. Table definitons: CREATE TABLE `data_timeslot` ( `id` TINYINT UNSIGNED AUTO_INCREMENT, `name` VARCHAR(30) NOT NULL, PRIMARY KEY(`id`) ) ENGINE=InnoDB CHARACTER SET utf8 COLLATE utf8_general_ci; CREATE TABLE `data` ( `id` BIGINT UNSIGNED AUTO_INCREMENT, `parent_id` INT UNSIGNED NOT NULL, `timeslot_id` TINYINT UNSIGNED NOT NULL, `timestamp` TIMESTAMP NOT NULL, `value` DECIMAL(9,6) NOT NULL, PRIMARY KEY(`id`), FOREIGN KEY(`parent_id`) REFERENCES `parent`(`id`) ON UPDATE CASCADE ON DELETE RESTRICT, FOREIGN KEY(`timeslot_id` ) REFERENCES `data_timeslot`(`id`) ON UPDATE CASCADE ON DELETE RESTRICT ) ENGINE=InnoDB CHARACTER SET utf8 COLLATE utf8_general_ci; Yes, I know you don't like backticks in the SQL code. Edited February 21, 2013 by Christian F. Quote Link to comment Share on other sites More sharing options...
Barand Posted February 21, 2013 Share Posted February 21, 2013 Much better, I could live with that BTW, is the "spoiler" a standard feature on this board. If so, how? Quote Link to comment Share on other sites More sharing options...
Jessica Posted February 21, 2013 Share Posted February 21, 2013 Use [ spoiler] just like [ code] Quote Link to comment Share on other sites More sharing options...
Christian F. Posted February 21, 2013 Author Share Posted February 21, 2013 Hehe, good to know. The spoiler tags are indeed a standard feature of the board, easily added by using around the content you want to spoil. Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted February 22, 2013 Share Posted February 22, 2013 Christian, did you get an error message when you try to create the second table - data. I've got an error and I'm thinking you have a problem creating the foreign key. Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted February 22, 2013 Share Posted February 22, 2013 Ops..... you have one more table, named - parent. Sorry about that Quote Link to comment Share on other sites More sharing options...
Christian F. Posted February 22, 2013 Author Share Posted February 22, 2013 (edited) It's not as much a problem with creating the foreign key, as it's the entire "parent" table missing from the code I posted above. I didn't include it as it was trivial, and not relevant to the problem at hand. Should be easy to recreate it. While we're (still) on this subject, I was hoping if someone could perhaps tell me if there's a way to make this query use indices? Right now no matter what indices I add, and in what combination, it still insist upon using "temporary" and "filesort". $Query = <<<OutSQL SELECT s.`id`, s.`name`, DATE_FORMAT(d.`timestamp`, '$SQLFormat') AS date_res, AVG(d.`percent`) AS `avg`, t.`name` AS timeslot FROM `$TableData` AS d INNER JOIN `$TableParent` AS s ON s.`id` = d.`station_id` INNER JOIN `$TableTime` AS t ON t.`id` = d.`timeslot_id` WHERE d.`timestamp` >= %s AND d.`timestamp` <= %s $Where GROUP BY d.`parent_id`, date_res, d.`timeslot_id` ORDER BY s.`name` ASC,d.`timestamp` ASC OutSQL; The joins themselves are good, but it's the fetching from the primary table that causes issues. Explain: +----+-------------+-------+------+--------------------------------------------+--------------+---------+-------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+--------------------------------------------+--------------+---------+-------------------+------+---------------------------------+ | 1 | SIMPLE | s | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using temporary; Using filesort | | 1 | SIMPLE | d | ref | timeslot_id,timestamp,id_time_slot,id_slot | id_time_slot | 4 | s.id | 1 | Using where | | 1 | SIMPLE | t | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using where; Using join buffer | +----+-------------+-------+------+--------------------------------------------+--------------+---------+-------------------+------+---------------------------------+ Edited February 22, 2013 by Christian F. Quote Link to comment Share on other sites More sharing options...
Jessica Posted February 22, 2013 Share Posted February 22, 2013 (edited) I've never done it but a coworker told me it was possible a few weeks ago. http://dev.mysql.com...ndex-hints.html Might help. Edit: and according to the examples you can do it on the first table not just the joined ones. Edited February 22, 2013 by Jessica Quote Link to comment Share on other sites More sharing options...
Barand Posted February 22, 2013 Share Posted February 22, 2013 An index on the timestamp column should help. Also, there shouldn't be any need to format the timestamp if you just want standard yyyy-mm--dd format. Just select DATE(timestamp) to get the date portion Quote Link to comment Share on other sites More sharing options...
jazzman1 Posted February 22, 2013 Share Posted February 22, 2013 I think that oracle's optimizing guide will help you a lot - http://docs.oracle.com/html/A95912_01/wn32tune.htm#i631457 Quote Link to comment Share on other sites More sharing options...
Christian F. Posted February 23, 2013 Author Share Posted February 23, 2013 (edited) Jessica: Heh... Thanks for the tip, but unfortunately I don't know nearly enough about MySQL indices and other internals to start to manually mess with them. At least not for now. It's bookmarked though, for future reading. Barand: Thanks, that seems to have done something, but it didn't actually use the timestamp index..? Instead it chose the "parent_id, timeslot" index. Which, upon reading the EXPLAIN result makes sense: +----+-------------+-------+------+----------------------------------------------------------+---------+---------+-------------------------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+----------------------------------------------------------+---------+---------+-------------------------------------+------+---------------------------------+ | 1 | SIMPLE | t | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using temporary; Using filesort | | 1 | SIMPLE | s | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using join buffer | | 1 | SIMPLE | d | ref | timeslot_id,timestamp,id_time_slot,id_slot,idx_timestamp | id_slot | 5 | database.s.id,database.t.id | 49 | Using where | +----+-------------+-------+------+----------------------------------------------------------+---------+---------+-------------------------------------+------+---------------------------------+ "t" is the alias for the timeslot table (only 2 rows in it), and "s" the alias for the parent table. "d" is the data table, with all of the rows. Upon having a second look: I have no idea why adding the idx_timestamp index made any difference. Seeing as I had an index on the timestamp already..? Oh, well.. Guess I'll just have to check the performance once in a while, to see how it's holding up. At least it's using an index for the main data table. The format of the timestamp is actually one out of four available, so the formatting is (unfortunately) necessary. Jazzman: Thanks for trying to help, but unfortunately that guide was a bit too basic for my needs. Also, not sure how applicable it is for a MySQL database. Sentiment is appreciated though. Edited February 23, 2013 by Christian F. Quote Link to comment Share on other sites More sharing options...
PrecisionGW1 Posted February 23, 2013 Share Posted February 23, 2013 I have my site, I am now ready to build the table and setup the database. Could really use a hand. Any advice? C Quote Link to comment Share on other sites More sharing options...
trq Posted February 23, 2013 Share Posted February 23, 2013 I have my site, I am now ready to build the table and setup the database. Could really use a hand. Any advice? C Yeah. Open your own thread with some more specific questions. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.