Dababase Theory

Butterbean · November 18, 2014

I have created a system that uses an API to drop utility data into MSQL Server 2008. Once the data is in the db, a web app can query the data to create electric bills. Currently we are working with only one customer but are entertaining expansion and want to approach the growth from the best possible angle.

Considering that all data is the same type of data but comes from different customers and sources, would it be best to maintain 1 database with all of the utility meter data streaming to it or would it be best to have a new data base with each customer?

Also, assuming we keep it all in the same database, would it be best (assuming we are secure) to have all the individual customers data go to the same tables or have separate tables for separate customers.

I'm pretty certain I know the answer but I'm a novice so that means nothing.

Thank you for any help with this issue.

Edited November 18, 2014 by Butterbean

QuickOldCar · November 30, 2014

You know your business the best.

If you are doing large amounts of data with many customers... having them across different databases and possibly even multiple servers would be the best.

Can most likely get away with doing new tables per customer

If is a multiple table setup per customer I would say own database

I wouldn't even consider all customers in the same table

my first thought would be expansion , don't really want to do piles of queries per customer to populate new tables

in a table you already know is that customer, if was a shared table now has to sift or sort through it

table would become huge

tables can corrupt, want all your customers complaining or just one

also have to consider table locking for inserts and selects

does your api allow the customer direct access? is there security or permissions?

the more you separate your data the easier it will be to maintain your customers and more efficient

ignace · November 30, 2014

It's better to keep it all in one database. If you are unsure of the performance impact then simply set up a test database, fill it with data (try to keep it close to reality using current growth data), add proper indexing, then test your application performance.

You can still add caching front-ends if needed. Of course this means that during development you will have to centralise your queries so that in case you need to add a caching front-end it is easy to add in.

This is the idea:

interface ProductQuery {
  // relevant query methods
}

class DbProductQuery implements ProductQuery {
  // methods
}

class CacheProductQuery implements ProductQuery {
  private $productQuery;

  public function __construct(ProductQuery $fallbackProductQuery) {
    $this->productQuery = $fallbackProductQuery;
  }
}

Replace ProductQuery with your own version, perhaps CustomerQuery and ElectricalBillQuery

interface CustomerQuery {
  public function findCustomerById($id);
  // more methods..
}

interface ElectricalBillQuery {
  public function findElectricalBillById($id);
  // more methods..
}

class DbEntityQuery implements CustomerQuery, ElectricalBillQuery {
  // implement all methods
}

// or separate

class DbCustomerQuery extends DbQuery implements CustomerQuery {
 //
}

And this is just one way for you to go. You can still add scaling, sharding, and other stuff to improve your performance. So get rid of the idea of creating tables or databases on-the-fly because it will only hurt performance. Edited November 30, 2014 by ignace

QuickOldCar · November 30, 2014

So get rid of the idea of creating tables or databases on-the-fly because it will only hurt performance.

My impression is that the "customers" will be something like an entire electric company with all of their customers within.

It sounds like databases or tables wouldn't be made that often as are most likely large accounts.

ignace · December 1, 2014

Uhu, if there was a point in there that proved creating databases and tables on-the-fly is more performant I missed it.

QuickOldCar · December 1, 2014

One table of customers within , could be a million or so records of their mixed clients.

If each customer was own table or database that would eliminate having to sort or create an index.

A query to just an individual database or table and finding a single customer is lots faster than all one table.

ignace · December 2, 2014

Yet why is it then that nobody does this, since this is so performant?

QuickOldCar · December 2, 2014

I'm sure people do it both ways.

Is more than just a performance aspect here though.

My opinion is anyone doing large amounts of data should not hold it all in one table or have extremely large indexes.

Anyone doing big data or real-time data should look into NoSQL

I suggest cassandra

Everyone's needs are not always similar.

I'll just tell you my experiences.

As in my cases the only feasable backup solutions would be mirroring drives or a node replication system... or stopping everything and painfully wait for all the mysql data files to copy to another drive.

I've ran into corruption issues quite a few times due to power outages with sql.

The larger tables are not always repairable.

The risk of losing all the data is not worth it.

If you are talking anywhere under a million records, yes with proper indexing is entirely possible to get reasonable fast results a single table.

Once you start going higher will start seeing the performance degrade and totally reliant on caching.

If the application has constant inserts or table locking, you will see it even slower and have to wait for other processes to complete.

I actually do my search single tables with 5 million sites and over a billion for links.

If was a better solution 7 years ago I would have used it.

It's near impossible to query the larger links table other than a normal query pulling by just an id or a set of them.

I was forced to use sphinx which always has to reindex and does not index all results or in the same way mysql could fetch them.

Another instance of mine was a high traffic, big data feed aggregator doing articles upon thousands of websites.

It was always inserting new data which affected the fetching and display. Again had to rely heavily on caching which sucks if want to keep showing the more recent data, so a lot gets passed by.

I also have a few api systems that partly uses my own simple files and indexing system versus all sql because of the above issues.

I'll surely migrate that all to cassandra.

kicken · December 2, 2014

As in my cases the only feasable backup solutions would be mirroring drives or a node replication system

Those are both things that you should be doing anyway when dealing with large amounts of data. You want to avoid ever having to do a full restore as that process would take a long time.

With drive mirroring and replication the need to dig in your backups should be minimal. To make those backups though you'd have data replicate to a dedicated backup node which can be archived in offline mode without interrupting normal operations.

If you are talking anywhere under a million records, yes with proper indexing is entirely possible to get reasonable fast results a single table.

If you combine proper indexing with partitioning you can query extremely large table without issue. I had a table at one point with near a billion rows and was having slow queries even with an index. Once I discovered partitioning and set that up, queries were back down to around a second. In my case I was using MySQL not SQL Server, but the concept and performance is likely the same.

If the application has constant inserts or table locking, you will see it even slower and have to wait for other processes to complete.

Use InnoDB, it does row-level locking so inserting/updating doesn't cause issues with reading. If you've got multiple servers going with replication, do your select queries only on the slaves and your inserts/updates on the master.

If you know how to properly use a relational database system they will take you a long way. That said, yes they are not always the best tool for the job, which is why other options exist. In the OP's situation though, a relational database is perfect.

A single database for all their client's will work fine. With proper indexing and partitioning it will be very snappy. Keeping everything in a single database will allow them to easily do some aggregate reporting across all their clients if they want in the future. All it'll cost is a single extra field on some tables and indexes to identify which client a customer belongs to.

QuickOldCar · December 3, 2014

Don't want to hijack the OP's thread or anything, sorry OP.

I agree with what you say kicken.

As for my needs...

My search index uses fulltext myisiam using mysql on a windows manual built amp stack.(not wamp or anything similar, they always crashed under loads)

The system uses the search terms as dynamic queries to organize the data, not the data being placed into tags or categories.

It's indexed properly and the only way can do so many at this time.

I've tried and experimented quite a few ways how to go about it over the years, the current way was the only way that could even work with so many records.

It basically runs on 2 intel quad core servers, one for website, the other for crawlers,indexers and snaps images.

Have an additional 5 other servers to turn up the juice a bit, but is a home network so keep it at most to like 10 instances at a time.

I actually made it modular to do multiple servers and multiple instances for expansion, the bottleneck will always be mysql though.

I've considered exporting all the data into a more relationship category system just for display purposes but that will lose all the advanced search functionality it currently has.

I assumed would never be a good idea to have every single possible word maybe even combinations of words as a category anyway.

Actually I have plans in the future to turn it all into an api which will do background queries as json and save them versus my current html cache system.

It's been my personal project anyway I so happen to share, was never in a hurry with it.

kicken · December 3, 2014

I've not really used mysql's or sql server's full text capabilities as I haven't tried building a general search engine or anything like it. However I'd probably say if one was going to do that, they are probably not the right tools for that job. There's likely a reason why google doesn't use a relational database for it's search engine.

ignace · December 3, 2014

MySQL has FULLTEXT indexing on InnoDB starting from 5.6. However I've always used ElasticSearch instead of fulltext indexing.

QuickOldCar · December 3, 2014

I tried innodb recently and it had different limitations than myisiam did.

Thanks for telling me about elasticsearch, I'll try it on my api's and cassandra

That was a main reason never strayed from mysql, to get a full feature search

Sounds like a winner to me, it makes it sound so easy.

supports real time GET requests so good for NoSQL

real time search, reindexing is horrible but that may be too intensive depending the amount of data

I've used lucene a few times in the past, not bad at all.

schema free json

works with partitioning

multitenancy

indices can be divided into shards and does all the delegation,rebalancing and routing

faceted and percolated search for filters and mapping

Sign In

Dababase Theory

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Important Information