Jump to content

Help structuring a table for a large dataset


charper

Recommended Posts

First off, I would like to thank everybody in this community.  While this may be my first post, it's definitely not the first time I've found good information on this site - thank you!

 

So, my problem:

This is not really stock data, but it's a good visualization...

I want to create an archive of data.  Every hour, I'll pull in xml data covering 4 parameters (using the stock data visualization: num sold, num available, high price, low price).  The real kicker though is I could pull in these 4 parameters for up to 7500 individual stocks.  The nice part - these stocks could be a good index value - there are only 7500 of them and they will never change.

 

What I plan on doing is archiving the hourly data for maybe 24 hours.  I would also like to archive lower resolutions for longer periods.  I.E. daily averages for a month or something.

 

Here is my best guess of how to store the data:

 

Every hour I would drop a new row for every index encountered (again, up to 7500).  The table after 1 hour could look like this:

 

 

NameTimestampParam1Param2

11247900000%data%%data%

21247900000%data%%data%

11247903600%data%%data%

21247903600%data%%data%

 

 

Every hour I would have to do a delete row where timestamp < now - 48 hours.  When it's time to archive the data I would run an average on a timestamp range and insert data (in this format) into a new table.

 

However, I'm pretty newbie with mySql optimizations, and I've made some mistakes in the past.  This is a table that's large enough that stuff like this really matters and I could use some help.  Any suggestions for improvement, or is there another way to do this that would be better?

Link to comment
Share on other sites

I recommend the following:

 

The columns "Name" and "Timestamp" would be your primary key. At any given moment there should be around 360 000 rows in that table (48 hours of data). Delete statements on large tables usually don't work very efficiently. But I don't think there would be such a problem on this table. If time is not a problem, create an index on the "Timestamp" field and the "Name" field, insert the averages in your average table and before the delete process occurs, drop those indexes. To create indexes might take some time though.

Link to comment
Share on other sites

Thanks for the advice.

 

Right now I've decided on a multi-column primary key (Name and Timestamp), as you suggested.  This works great for my queries since I will always select tables as WHERE name = 'x' and occasionally add on a timestamp range.

 

I'll have to go back and see if I really need an index on any other columns.  Right now I don't think I'm ever going to be searching them.

 

The hourly table is currently up and running, but not the daily.  If I can ask a question though: why do you recommend dropping and rebuilding the indexes?  Is there a problem I'm not aware of with dropping rows on an indexed table?

Link to comment
Share on other sites

sorry for the late reply. i used to work for a company that did point of sale transactions in an oracle database. when you have a table that contains 800-900 million rows it takes some time to move data. some might say it is inefficient to use indexes at all on tables this size, but i was not the dba. we had to move data on regular intervals to keep the table's rows to a minimum. in our processing jobs we dropped the indexes for the tables we were to 'service' to speed up query speeds.

 

i don't think you will experience a problem with speed (except if you running this on a pentium 3 or something) and to implement a drop index process might be fruitless in your circumstances. but, should speed become a problem you might have to look into this.

 

btw, here's a tip in deleting large amounts of data quickly: create a duplicate table like the one you are about to delete from. insert the necessary data into the newly created table, drop the old table and rename the new one to the old one. this won't go down well on a 'live' table though.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.