Multiple databases within Instance vs One global database

n1concepts · February 22, 2018

Hi,

I have some questions based on a MS-SQL project I'm, now, apart of to see if the current db structure the most efficient - basically, asking about db provisioning 101 / theory as I'm getting push back on recommended edits to improve the current db design based on 'expected' future growth.

Here's the current db structure for the application:

There is one MS-SQL database server, running one instance, to which the application is (currently) creating a new database for each (new) client that signs up to the service - all on that one db server.
The current devs stated that they designed the db process in this manner so that each client's data would be, completely, separate from the other clients' data & this way db administrative efforts would be less hectic & more efficient

My view: I think there is a better way to address the db / data requirements

- see below to which I would like feedback which approach is more efficient, short & long-term based on resource requirements & load / latency.

Issue / Concern: (Giving a scenario of having - i.e 1k or even 10k clients at some point to which will accumulate, total, of around 30G of data over time)

Q: Won't having that number of (individual) db's - all, literately, an exact replicate of the other db - actually create more (unnecessary) latency / load on the db server / resources then just having ONE 'master' database & housing ALL the clients' data within that (optimized) database (of course Indexing crucial to keep that db optimized to handle the total load for all client data; it's same as having multiple db's in my view based on proper indexing - just a more efficient design)?

OR (even better approach based on the expected growth of data, overall)

Distribute the, pertinent, data sets based on their role / service - individually - across separate database servers - so that the load / latency issues minimized for each db server (itself).

For example: all email marketing (results) data would be housed on db1 server whereas all client specific data would be housed on db2 server.

Reason: the bulk email marketing application demands a tremendous amount of resources & disk space so by isolating those services / operations removes contention between the two db's (Yes / No?)

To hook the separate sets of data when required (client or email data)?

Answer: - the SQL queries would just JOIN / UNION data sets from the (separate) specified MS-SQL db servers based on the required data to produce SQL results (JOIN / UNION already the method with current BI queries using to date; I'm just suggesting the data be segmented to different db servers based on their role / service to the application).

Note: what I'm wanting to get other's views on is this:

Q: does it make sense to have multiple copies of the same (exact) db schema for client data vs one 'master' db schema which can house ALL the data so there is less db administrative work (1st off) & (2ndly) reducing the actual load on the 'one server' itself. Note: I know storage pools can be implemented to expand disk space for the MS-SQL service but that's another topic regarding 'limited' disk space issues (per server) as the data grows, collectively.

I would appreciate feedback on those questions to get others opinion(s) based on their experience in working with a 'global' / distributed cloud-based applications & MS-SQL.

I have some ideas -

1. compiling all the (current), individual, client dbs to the one 'global' clients db then implementing a data warehouse (DW) to pull all data that's over seven days into the DW to which I'll then be able to manage said data using a variety of 'external' sources - MySQL Storage Area Network (SAN) solution & even Excel spreadsheets <for very old data that will be older than five years - to free disk space on the DW, etc...>

2. Implement Redis cache / cluster to offload (default) queries from both the 'source' db servers & DW for the majority of the queries - being the more requested data set / queries - will (now) be in-memory for each client. Note: Redis cache will NOT load those client's data sets until said client (any member of that group) 'successfully' authenticated into the system.

3. (still speaking on Redis) - data sets that have been sitting idle for a specified period of time (short # of days) would be flushed from cache to free of memory.

Note: the cluster will be defined so that there is ample redundancy aside from persistence of the DW which will have 50G of disk space along with the use of an external SAN solution which will house another 50G of disk space - acting as source / destination to house Excel files along with the oldest sets of data within MySQL.

Using data mgt services, the MS-SQL (DW) will pull any 'requested' data back into the DW from the SAN (destination) to serve the application, dynamically.

Thus, this solution would make the current db solution more efficient & robust to expand both - horizontally & vertically - based on biz / data demands.

Ok, those are my thoughts to make the current db design more efficient with the objective of thinking 'short & long-term' based on expected growth of data.

Let me know your thoughts - suggestions / comments - all welcome.

Thx in advance for responses!

n1concepts · February 22, 2018

FYI: I left out one important feature with what I'm thinking may be a more suitable db design - Partitioning

Note: I'm thinking to address the lag time / latency if a particular table grows - i.e. millions or even billions of records.

Answer: by partitioning the table(s) by week, month, etc... that will reduce the search (latency); limiting the amount of data required to be searched / sorted to form the query.

Again, the question is which is better based on the suggestions I've posted?

1. Stay with the (individual) db's per client

or

2. compile all the client data into one MASTER db then implement partitioning and ensure proper indexing on tables to segment searches to keep load times to a minimum.

Again, thanks, again, for your comments / suggestions!

requinix · February 23, 2018

The current devs stated that they designed the db process in this manner so that each client's data would be, completely, separate from the other clients' data & this way db administrative efforts would be less hectic & more efficient

That's short-sighted. It doesn't matter where data lives as long as the application keeps everything separate, and this only "helps" administrative stuff if you have to work within the database directly (SQL queries and such). But multiple databases like this is almost always a bad idea in the long run.

One database is how it should be.

Q: Won't having that number of (individual) db's - all, literately, an exact replicate of the other db - actually create more (unnecessary) latency / load on the db server / resources then just having ONE 'master' database & housing ALL the clients' data within that (optimized) database (of course Indexing crucial to keep that db optimized to handle the total load for all client data; it's same as having multiple db's in my view based on proper indexing - just a more efficient design)?

Not really. What matters is how much data there is and how it's organized in the tables - multiple databases won't cause a performance problem simply because there are more than one.

Distribute the, pertinent, data sets based on their role / service - individually - across separate database servers - so that the load / latency issues minimized for each db server (itself).

That is actually an appropriate concern. It doesn't matter that the data is distributed (in fact that generally makes things more awkward to work with) but having that load distributed is reasonable. If you're at the point where multiple servers is worth the investment, of course.

You could have one lower-spec server handling data that's used less frequently or less aggressively (eg, big complicated queries) and another higher-spec server handling the data that gets processed more heavily.

For example: all email marketing (results) data would be housed on db1 server whereas all client specific data would be housed on db2 server.

Reason: the bulk email marketing application demands a tremendous amount of resources & disk space so by isolating those services / operations removes contention between the two db's (Yes / No?)

Right.

To hook the separate sets of data when required (client or email data)?

Answer: - the SQL queries would just JOIN / UNION data sets from the (separate) specified MS-SQL db servers based on the required data to produce SQL results (JOIN / UNION already the method with current BI queries using to date; I'm just suggesting the data be segmented to different db servers based on their role / service to the application).

And moving data across servers makes that harder, though IIRC SQL Server has native functionality for that. But if you can JOIN tables across servers in a query, I would expect that the processing has to happen on one server - it's not like the query is split in half and each piece executed by the server housing the associated data. So simply moving the data for the sake of moving it might not actually do you any good if you're querying both servers' data together frequently.

Q: does it make sense to have multiple copies of the same (exact) db schema for client data vs one 'master' db schema which can house ALL the data so there is less db administrative work (1st off) & (2ndly) reducing the actual load on the 'one server' itself.

No. Not to me. If you have a particular client or three that is putting an exorbitant load on your server then you need do something about getting them a dedicated instance - as in a separate copy of the application and database on separate servers. But in general it should be the same database/server for everyone.

Meanwhile your thoughts about migration and the distributed architecture I can't help too much with.

Answer: by partitioning the table(s) by week, month, etc... that will reduce the search (latency); limiting the amount of data required to be searched / sorted to form the query.

Make sure you have proper indexing before thinking about this. Indexes are almost always enough to do the job without separating data.

kicken · February 23, 2018

If it's just the dev wanting things to be separate then it sounds to me like just being lazy and not wanting to develop in a way that they data can be combined in the DB but isolated in the app. A few applications I've written over the years have used separate databases, but that's mainly because they were not designed to be a multi-client system so it was easier to just setup a new database for a new client. At the same time, the number of clients has been fairly low (half-dozen or less) so it's not a big deal.

You might make sure there's not any other reason for keeping it separate though, such as legal reasons. For example one place I worked for years ago had acquired a business and as part of the merger certain things had to be kept physically separated between the two businesses for legal reasons that I'm not privy too. Some of your clients could have it in a contract that their data must be isolated from any other clients. Maybe your sales guys know the data is isolated and are using that as part of their sales pitch.

Assuming no legal/contractual reasons to use separate databases, I'd probably lean toward combining the databases and letting the application deal with limiting access to data as needed.

I don't have any real practical experience with any large multi-client systems though, so take my advice at your own risk. The one system I worked on that was designed to multi-client and used a single database approach was a bit of a flop and never got much use so I don't really know if the decisions I made in it's design were good or bad.

Sign In

Multiple databases within Instance vs One global database

Recommended Posts

n1concepts

Link to comment

Share on other sites

n1concepts

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

kicken

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information