Jump to content

Summary Data Table Schema Recommendations..


Scooby08

Recommended Posts

MySQL Version 5.5.28

 

I am working on storing summary data such as counts, averages, rates, times for each hour of the day, grouped by "account_id", "affiliate_id", "hour" for each day.. I've included the start of a table to give an idea as to what I'm trying to do so far, but as you can see the table has many columns and will have many more when I add the rest of my columns..

 

The purpose is for faster page loading of reports.

 

I guess what I'm looking for is any advice on alternative solutions that may work for me here.. Or is this the proper way to store hourly data for faster page loads??

 

Thanks all!!

 

CREATE TABLE `accounts_summary_copy` (

`id` int(11) unsigned NOT NULL AUTO_INCREMENT,

`account_id` int(11) DEFAULT NULL,

`affiliate_id` varchar(64) DEFAULT NULL,

`posted` int(11) DEFAULT NULL,

`posted_time` double DEFAULT NULL,

`accepted` int(11) DEFAULT NULL,

`accepted_rate` double DEFAULT NULL,

`accepted_time` double DEFAULT NULL,

`errored` int(11) DEFAULT NULL,

`errored_rate` double DEFAULT NULL,

`errored_time` double DEFAULT NULL,

`created_date` date NOT NULL DEFAULT '0000-00-00',

`posted_0` int(11) DEFAULT NULL,

`posted_1` int(11) DEFAULT NULL,

`posted_2` int(11) DEFAULT NULL,

`posted_3` int(11) DEFAULT NULL,

`posted_4` int(11) DEFAULT NULL,

`posted_5` int(11) DEFAULT NULL,

`posted_6` int(11) DEFAULT NULL,

`posted_7` int(11) DEFAULT NULL,

`posted_8` int(11) DEFAULT NULL,

`posted_9` int(11) DEFAULT NULL,

`posted_10` int(11) DEFAULT NULL,

`posted_11` int(11) DEFAULT NULL,

`posted_12` int(11) DEFAULT NULL,

`posted_13` int(11) DEFAULT NULL,

`posted_14` int(11) DEFAULT NULL,

`posted_15` int(11) DEFAULT NULL,

`posted_16` int(11) DEFAULT NULL,

`posted_17` int(11) DEFAULT NULL,

`posted_18` int(11) DEFAULT NULL,

`posted_19` int(11) DEFAULT NULL,

`posted_20` int(11) DEFAULT NULL,

`posted_21` int(11) DEFAULT NULL,

`posted_22` int(11) DEFAULT NULL,

`posted_23` int(11) DEFAULT NULL,

`posted_time_0` double DEFAULT NULL,

`posted_time_1` double DEFAULT NULL,

`posted_time_2` double DEFAULT NULL,

`posted_time_3` double DEFAULT NULL,

`posted_time_4` double DEFAULT NULL,

`posted_time_5` double DEFAULT NULL,

`posted_time_6` double DEFAULT NULL,

`posted_time_7` double DEFAULT NULL,

`posted_time_8` double DEFAULT NULL,

`posted_time_9` double DEFAULT NULL,

`posted_time_10` double DEFAULT NULL,

`posted_time_11` double DEFAULT NULL,

`posted_time_12` double DEFAULT NULL,

`posted_time_13` double DEFAULT NULL,

`posted_time_14` double DEFAULT NULL,

`posted_time_15` double DEFAULT NULL,

`posted_time_16` double DEFAULT NULL,

`posted_time_17` double DEFAULT NULL,

`posted_time_18` double DEFAULT NULL,

`posted_time_19` double DEFAULT NULL,

`posted_time_20` double DEFAULT NULL,

`posted_time_21` double DEFAULT NULL,

`posted_time_22` double DEFAULT NULL,

`posted_time_23` double DEFAULT NULL,

`accepted_0` int(11) DEFAULT NULL,

`accepted_1` int(11) DEFAULT NULL,

`accepted_2` int(11) DEFAULT NULL,

`accepted_3` int(11) DEFAULT NULL,

`accepted_4` int(11) DEFAULT NULL,

`accepted_5` int(11) DEFAULT NULL,

`accepted_6` int(11) DEFAULT NULL,

`accepted_7` int(11) DEFAULT NULL,

`accepted_8` int(11) DEFAULT NULL,

`accepted_9` int(11) DEFAULT NULL,

`accepted_10` int(11) DEFAULT NULL,

`accepted_11` int(11) DEFAULT NULL,

`accepted_12` int(11) DEFAULT NULL,

`accepted_13` int(11) DEFAULT NULL,

`accepted_14` int(11) DEFAULT NULL,

`accepted_15` int(11) DEFAULT NULL,

`accepted_16` int(11) DEFAULT NULL,

`accepted_17` int(11) DEFAULT NULL,

`accepted_18` int(11) DEFAULT NULL,

`accepted_19` int(11) DEFAULT NULL,

`accepted_20` int(11) DEFAULT NULL,

`accepted_21` int(11) DEFAULT NULL,

`accepted_22` int(11) DEFAULT NULL,

`accepted_23` int(11) DEFAULT NULL,

`accepted_rate_0` double DEFAULT NULL,

`accepted_rate_1` double DEFAULT NULL,

`accepted_rate_2` double DEFAULT NULL,

`accepted_rate_3` double DEFAULT NULL,

`accepted_rate_4` double DEFAULT NULL,

`accepted_rate_5` double DEFAULT NULL,

`accepted_rate_6` double DEFAULT NULL,

`accepted_rate_7` double DEFAULT NULL,

`accepted_rate_8` double DEFAULT NULL,

`accepted_rate_9` double DEFAULT NULL,

`accepted_rate_10` double DEFAULT NULL,

`accepted_rate_11` double DEFAULT NULL,

`accepted_rate_12` double DEFAULT NULL,

`accepted_rate_13` double DEFAULT NULL,

`accepted_rate_14` double DEFAULT NULL,

`accepted_rate_15` double DEFAULT NULL,

`accepted_rate_16` double DEFAULT NULL,

`accepted_rate_17` double DEFAULT NULL,

`accepted_rate_18` double DEFAULT NULL,

`accepted_rate_19` double DEFAULT NULL,

`accepted_rate_20` double DEFAULT NULL,

`accepted_rate_21` double DEFAULT NULL,

`accepted_rate_22` double DEFAULT NULL,

`accepted_rate_23` double DEFAULT NULL,

`accepted_time_0` double DEFAULT NULL,

`accepted_time_1` double DEFAULT NULL,

`accepted_time_2` double DEFAULT NULL,

`accepted_time_3` double DEFAULT NULL,

`accepted_time_4` double DEFAULT NULL,

`accepted_time_5` double DEFAULT NULL,

`accepted_time_6` double DEFAULT NULL,

`accepted_time_7` double DEFAULT NULL,

`accepted_time_8` double DEFAULT NULL,

`accepted_time_9` double DEFAULT NULL,

`accepted_time_10` double DEFAULT NULL,

`accepted_time_11` double DEFAULT NULL,

`accepted_time_12` double DEFAULT NULL,

`accepted_time_13` double DEFAULT NULL,

`accepted_time_14` double DEFAULT NULL,

`accepted_time_15` double DEFAULT NULL,

`accepted_time_16` double DEFAULT NULL,

`accepted_time_17` double DEFAULT NULL,

`accepted_time_18` double DEFAULT NULL,

`accepted_time_19` double DEFAULT NULL,

`accepted_time_20` double DEFAULT NULL,

`accepted_time_21` double DEFAULT NULL,

`accepted_time_22` double DEFAULT NULL,

`accepted_time_23` double DEFAULT NULL,

`errored_0` int(11) DEFAULT NULL,

`errored_1` int(11) DEFAULT NULL,

`errored_2` int(11) DEFAULT NULL,

`errored_3` int(11) DEFAULT NULL,

`errored_4` int(11) DEFAULT NULL,

`errored_5` int(11) DEFAULT NULL,

`errored_6` int(11) DEFAULT NULL,

`errored_7` int(11) DEFAULT NULL,

`errored_8` int(11) DEFAULT NULL,

`errored_9` int(11) DEFAULT NULL,

`errored_10` int(11) DEFAULT NULL,

`errored_11` int(11) DEFAULT NULL,

`errored_12` int(11) DEFAULT NULL,

`errored_13` int(11) DEFAULT NULL,

`errored_14` int(11) DEFAULT NULL,

`errored_15` int(11) DEFAULT NULL,

`errored_16` int(11) DEFAULT NULL,

`errored_17` int(11) DEFAULT NULL,

`errored_18` int(11) DEFAULT NULL,

`errored_19` int(11) DEFAULT NULL,

`errored_20` int(11) DEFAULT NULL,

`errored_21` int(11) DEFAULT NULL,

`errored_22` int(11) DEFAULT NULL,

`errored_23` int(11) DEFAULT NULL,

`errored_rate_0` double DEFAULT NULL,

`errored_rate_1` double DEFAULT NULL,

`errored_rate_2` double DEFAULT NULL,

`errored_rate_3` double DEFAULT NULL,

`errored_rate_4` double DEFAULT NULL,

`errored_rate_5` double DEFAULT NULL,

`errored_rate_6` double DEFAULT NULL,

`errored_rate_7` double DEFAULT NULL,

`errored_rate_8` double DEFAULT NULL,

`errored_rate_9` double DEFAULT NULL,

`errored_rate_10` double DEFAULT NULL,

`errored_rate_11` double DEFAULT NULL,

`errored_rate_12` double DEFAULT NULL,

`errored_rate_13` double DEFAULT NULL,

`errored_rate_14` double DEFAULT NULL,

`errored_rate_15` double DEFAULT NULL,

`errored_rate_16` double DEFAULT NULL,

`errored_rate_17` double DEFAULT NULL,

`errored_rate_18` double DEFAULT NULL,

`errored_rate_19` double DEFAULT NULL,

`errored_rate_20` double DEFAULT NULL,

`errored_rate_21` double DEFAULT NULL,

`errored_rate_22` double DEFAULT NULL,

`errored_rate_23` double DEFAULT NULL,

`errored_time_0` double DEFAULT NULL,

`errored_time_1` double DEFAULT NULL,

`errored_time_2` double DEFAULT NULL,

`errored_time_3` double DEFAULT NULL,

`errored_time_4` double DEFAULT NULL,

`errored_time_5` double DEFAULT NULL,

`errored_time_6` double DEFAULT NULL,

`errored_time_7` double DEFAULT NULL,

`errored_time_8` double DEFAULT NULL,

`errored_time_9` double DEFAULT NULL,

`errored_time_10` double DEFAULT NULL,

`errored_time_11` double DEFAULT NULL,

`errored_time_12` double DEFAULT NULL,

`errored_time_13` double DEFAULT NULL,

`errored_time_14` double DEFAULT NULL,

`errored_time_15` double DEFAULT NULL,

`errored_time_16` double DEFAULT NULL,

`errored_time_17` double DEFAULT NULL,

`errored_time_18` double DEFAULT NULL,

`errored_time_19` double DEFAULT NULL,

`errored_time_20` double DEFAULT NULL,

`errored_time_21` double DEFAULT NULL,

`errored_time_22` double DEFAULT NULL,

`errored_time_23` double DEFAULT NULL,

PRIMARY KEY (`id`),

UNIQUE KEY `accid_affid_date` (`account_id`,`affiliate_id`,`created_date`),

KEY `account_id` (`account_id`),

KEY `affiliate_id` (`affiliate_id`),

KEY `created_date` (`created_date`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Link to comment
Share on other sites

Even IF (that's a BIG if) it is necessary to store aggregated data for summaries, it is NOT recommended to store multiple columns of data with the same meaning. If you have to add a sequence number to the name of a column, then you need to split those columns out into a child table.

 

This child table should be keyed by date and hour. Then you can keep as much summary data as you want, and let the queries determine the period being examined.

 

Store only the aggregates, do any calculations possible at query-time. I mean, do not store the average for a given hour, if you can calculate it from other data in the row.

 

There is probably NO advantage to defining a KEY on account_id (in your example) since it is the first field in the composite KEY, you essentially have that index available already.

Link to comment
Share on other sites

Alrighty DavidAM.. Good stuff! I appreciate the response..

 

In my case, it's definitely necessary to store aggregated data for summary reports (being there's so many records), so I created a child table like so..

 

CREATE TABLE `accounts_summary_child` (

`id` int(11) unsigned NOT NULL AUTO_INCREMENT,

`account_id` int(11) DEFAULT NULL,

`affiliate_id` varchar(64) DEFAULT NULL,

`created_date` date NOT NULL DEFAULT '0000-00-00',

`hour` int(11) DEFAULT NULL,

`posted` int(11) DEFAULT NULL,

`posted_time` double DEFAULT NULL,

`accepted` int(11) DEFAULT NULL,

`accepted_rate` double DEFAULT NULL,

`accepted_time` double DEFAULT NULL,

`errored` int(11) DEFAULT NULL,

`errored_rate` double DEFAULT NULL,

`errored_time` double DEFAULT NULL,

`posted_postback` int(11) DEFAULT NULL,

`posted_postback_time` double DEFAULT NULL,

`accepted_postback` int(11) DEFAULT NULL,

`accepted_postback_rate` double DEFAULT NULL,

`accepted_postback_time` double DEFAULT NULL,

`errored_postback` int(11) DEFAULT NULL,

`errored_postback_rate` double DEFAULT NULL,

`errored_postback_time` double DEFAULT NULL,

`sales` int(11) DEFAULT NULL,

`sales_rate` double DEFAULT NULL,

`sales_sum_price` decimal(10,2) DEFAULT NULL,

`sales_avg_price` decimal(10,2) DEFAULT NULL,

`score` int(11) DEFAULT NULL,

`score_avg` int(11) DEFAULT NULL,

PRIMARY KEY (`id`),

KEY `date_hour` (`created_date`,`hour`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 

But then again you said to remove anything that can be calculated by other rows so I then broke it down to this

 

CREATE TABLE `accounts_summary_child` (

`id` int(11) unsigned NOT NULL AUTO_INCREMENT,

`account_id` int(11) DEFAULT NULL,

`affiliate_id` varchar(64) DEFAULT NULL,

`created_date` date NOT NULL DEFAULT '0000-00-00',

`hour` int(11) DEFAULT NULL,

`posted` int(11) DEFAULT NULL,

`posted_time` double DEFAULT NULL,

`accepted` int(11) DEFAULT NULL,

`accepted_time` double DEFAULT NULL,

`errored` int(11) DEFAULT NULL,

`errored_time` double DEFAULT NULL,

`posted_postback` int(11) DEFAULT NULL,

`posted_postback_time` double DEFAULT NULL,

`accepted_postback` int(11) DEFAULT NULL,

`accepted_postback_time` double DEFAULT NULL,

`errored_postback` int(11) DEFAULT NULL,

`errored_postback_time` double DEFAULT NULL,

`sales` int(11) DEFAULT NULL,

`sales_sum_price` decimal(10,2) DEFAULT NULL,

`score` int(11) DEFAULT NULL,

PRIMARY KEY (`id`),

KEY `date_hour` (`created_date`,`hour`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 

Am I on the right track here? Or have I misread your suggestions..

 

Thanks!

Link to comment
Share on other sites

You would not carry the account_id or affiliate_id in the child table. You would carry the parent table's id (as a foreign key).

 

If I am reading this correctly, the account_id, affiliate_id, created_date and hour columns identify a row, right? I would move the created_date up into the parent record as well. I would also rename it to "summary_date", I assume it represents the date being summarized, not the date the summary record was created.

 

So now your tables look like this:

CREATE TABLE accounts_summary_copy (
id int(11) unsigned NOT NULL AUTO_INCREMENT,
account_id int(11) DEFAULT NULL,
affiliate_id varchar(64) DEFAULT NULL,
-- RENAMED FROM created_date TO summary_date
summary_date date NOT NULL DEFAULT '0000-00-00',

PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE accounts_summary_child (
id int(11) unsigned NOT NULL AUTO_INCREMENT,
-- REMOVED account_id, affiliate_id AND created_date ADDED parent_id INSTEAD
parent_id int(11) unsigned NOT NULL,
-- CHANGED NAME OF hour COLUMN ALSO CHANGE DATATYPE
summary_hour TINYINT UNSIGNED DEFAULT NULL,
posted int(11) DEFAULT NULL,
posted_time double DEFAULT NULL,
accepted int(11) DEFAULT NULL,
accepted_time double DEFAULT NULL,
errored int(11) DEFAULT NULL,
errored_time double DEFAULT NULL,
posted_postback int(11) DEFAULT NULL,
posted_postback_time double DEFAULT NULL,
accepted_postback int(11) DEFAULT NULL,
accepted_postback_time double DEFAULT NULL,
errored_postback int(11) DEFAULT NULL,
errored_postback_time double DEFAULT NULL,
sales int(11) DEFAULT NULL,
sales_sum_price decimal(10,2) DEFAULT NULL,
score int(11) DEFAULT NULL,
PRIMARY KEY (id),
-- ADD FOREIGN KEY
FOREIGN KEY (parent_id) REFERENCES accounts_summary_copy(id)
ON UPDATE CASCADE ON DELETE CASCADE,
-- CHANGE ALTERNATE KEY SINCE WE REMOVED THE created_date
KEY hour_parent (hour,parent_id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 

Note: I never use the backticks for column names. They are only necessary if the column name is a reserved word or otherwise non-standard. I find it best to avoid reserved words (and potential reserved words) for column names as it can create issues later on. You may note that I changed the `hour` column to summary_hour. I'm not sure if HOUR is a reserved word or not, but I think it has potential.

 

The FOREIGN KEY constraint will add an index on parent_id automatically.

 

I'm not sure of the data definition for some of your columns. But concerning the removal of the "calculated" columns. If, for instance, score_avg, can be calculated from the other data in the row, you can safely remove it. If the value is dependent on other rows (say a running average for the day), you may want to leave it in. I would personally, do some testing with live data to determine if I could live with the overhead or not. (And whether or not I could come up with the correct query).

 

You may want additional indexes on the parent table (for account and/or affiliate, and possibly date) it depends on how you intend to query it.

 

[Edit]: I don't have as much experience with OLDP (and summary tables) as I do with OLTP. So, I may be pushing too far towards normalization. Maybe someone else will pop in and confirm or refute my suggestions.

Edited by DavidAM
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.