Jump to content

What is the best way to store a large amount of data


Jiblix

Recommended Posts

Greetings!

 

A little background: I run a restaurant which has a system in place that stores data on every vehicle that passes through the drive-thru lane (time in lane, time at order taking spot, time at the window, etc). I am writing a script which takes this xml dump and stores it in MySQL. I am wondering what the best way of storing this data would be? (roughly 1k entries per day, 7 days a week, 365 days a year).

 

I am wondering if a new table should be autocreated daily or if storing it all in one large table would work. With this data there will be lots of scripts for sorta data to show average results per day / per week / by hour, etc. I haven't worked with databases of this size before so I am at a loss of where to start to ensure I am able to filter this data as quickly as possible.

 

Thanks for any feedback!

Link to comment
Share on other sites

Do not break up the data into separate tables based upon day or some other arbitrary parameter. You should absolutely normalize the data if there is a logical structure of the data. E.g. if you were to store the order details for each drive through order then that data would be stored in a separate table with a foreign key back to the drive through entry.

 

If/when you have performance problems you should first ensure your table is structured correctly - e.g. are the applicable fields indexed, checked the efficiency of your queries. If everything is correct, then you can look into archiving historical data.

Link to comment
Share on other sites

Modern dbs such as MySQL and even more powerful ones can handle pretty large sized tables and collections of tables. As said above, you have described only a single table so far. Thinking about a drive-in window that provides some service, I see at most 30 entries per hour which is almost 200k per year. That is a lot and if you have more than one window or actually have this kind of volume you will have a large db pretty soon. But as you said - the data is being collected by a separate online system and you are simply taking that data for an 'offline' db which will not be bothered by this volume for a long time since it will be in use by limited clients and no online demand. I don't actually know what numbers MySQL can actually handle but a few hundred thousand doesn't seem outlandish.

 

Some info found in the MySQL docs for version 5.7:

"The effective maximum table size for MySQL databases is usually determined by operating system constraints on file sizes, not by MySQL internal limits. For up-to-date information operating system file size limits, refer to the documentation specific to your operating system.

 

Windows users, please note that FAT and VFAT (FAT32) are not considered suitable for production use with MySQL. Use NTFS instead. "

There is also mention of size limits for db that use myIsam instead of InnoDB, so you s/b look at that in your design as well.

 

The link for this is: https://dev.mysql.com/doc/refman/5.7/en/table-size-limit.html

Link to comment
Share on other sites

365k rows per year is not a large dataset at all. To give you a point of reference, I run a few sites on a $40/month 8gb Linode including a database with a 2.3 million row statistics table. Queries against those statistics which include joins, and do ranking and summarization are typically done in milliseconds.

 

Make sure that you are using InnoDB tables, and have allocated sufficient resources to the MySQL server to allow for buffer pool caching of data.

 

As others commented already, you want to have the best database design possible, including normalization, proper indexing, and minimal and correct datatype usage.

 

For example, if you have dates and can use a TIMESTAMP instead of a DATETIME do that. If you have numbers and can use a Tinyint in place of a Mediumint or Int, use that.

 

Don't use CHARS unless you are guaranteed to have that number of characters and the column is not null. Use varchar in most cases.

 

The most important ingredients for maximizing your performance on a given platform is to keep the database as small as possible through your table design, and to have the right indexing in place to handle the queries you'll be doing.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.