Jiblix Posted March 12, 2018 Share Posted March 12, 2018 Greetings! A little background: I run a restaurant which has a system in place that stores data on every vehicle that passes through the drive-thru lane (time in lane, time at order taking spot, time at the window, etc). I am writing a script which takes this xml dump and stores it in MySQL. I am wondering what the best way of storing this data would be? (roughly 1k entries per day, 7 days a week, 365 days a year). I am wondering if a new table should be autocreated daily or if storing it all in one large table would work. With this data there will be lots of scripts for sorta data to show average results per day / per week / by hour, etc. I haven't worked with databases of this size before so I am at a loss of where to start to ensure I am able to filter this data as quickly as possible. Thanks for any feedback! Quote Link to comment Share on other sites More sharing options...
Psycho Posted March 12, 2018 Share Posted March 12, 2018 Do not break up the data into separate tables based upon day or some other arbitrary parameter. You should absolutely normalize the data if there is a logical structure of the data. E.g. if you were to store the order details for each drive through order then that data would be stored in a separate table with a foreign key back to the drive through entry. If/when you have performance problems you should first ensure your table is structured correctly - e.g. are the applicable fields indexed, checked the efficiency of your queries. If everything is correct, then you can look into archiving historical data. Quote Link to comment Share on other sites More sharing options...
ginerjm Posted March 12, 2018 Share Posted March 12, 2018 Modern dbs such as MySQL and even more powerful ones can handle pretty large sized tables and collections of tables. As said above, you have described only a single table so far. Thinking about a drive-in window that provides some service, I see at most 30 entries per hour which is almost 200k per year. That is a lot and if you have more than one window or actually have this kind of volume you will have a large db pretty soon. But as you said - the data is being collected by a separate online system and you are simply taking that data for an 'offline' db which will not be bothered by this volume for a long time since it will be in use by limited clients and no online demand. I don't actually know what numbers MySQL can actually handle but a few hundred thousand doesn't seem outlandish. Some info found in the MySQL docs for version 5.7: "The effective maximum table size for MySQL databases is usually determined by operating system constraints on file sizes, not by MySQL internal limits. For up-to-date information operating system file size limits, refer to the documentation specific to your operating system. Windows users, please note that FAT and VFAT (FAT32) are not considered suitable for production use with MySQL. Use NTFS instead. " There is also mention of size limits for db that use myIsam instead of InnoDB, so you s/b look at that in your design as well. The link for this is: https://dev.mysql.com/doc/refman/5.7/en/table-size-limit.html Quote Link to comment Share on other sites More sharing options...
gizmola Posted March 12, 2018 Share Posted March 12, 2018 365k rows per year is not a large dataset at all. To give you a point of reference, I run a few sites on a $40/month 8gb Linode including a database with a 2.3 million row statistics table. Queries against those statistics which include joins, and do ranking and summarization are typically done in milliseconds. Make sure that you are using InnoDB tables, and have allocated sufficient resources to the MySQL server to allow for buffer pool caching of data. As others commented already, you want to have the best database design possible, including normalization, proper indexing, and minimal and correct datatype usage. For example, if you have dates and can use a TIMESTAMP instead of a DATETIME do that. If you have numbers and can use a Tinyint in place of a Mediumint or Int, use that. Don't use CHARS unless you are guaranteed to have that number of characters and the column is not null. Use varchar in most cases. The most important ingredients for maximizing your performance on a given platform is to keep the database as small as possible through your table design, and to have the right indexing in place to handle the queries you'll be doing. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.