Scalability vs flexibility, any solutions?

fe_81 · January 13, 2011

Hi,

I have a question about what is the most suitable database and design principles when you want to build a scalable system, but you also want to keep the code flexible to changes. I know this depends a lot on the case, so I’m going to go into more details.

Let’s say we have product database, with lots of products. Products are related to several other things, like stores selling them, manufacturer who builds them, customers who bought them and their details, etc. All these will have some own attributes. Now, we will have a very complex admin panel, where you can search products based on keywords and other attributes and sort them in lists based on different rules, such as which has sold most (number of customers), etc. This would be trivial to implement if we would not need to care about scalability. Most searches and sorting of lists can be easily implemented with a few joins and sorting by COUNT(…) or whatever we want. But the problem is that this won’t scale, or will it?

We probably need to partition the database, due to high traffic, so we can’t really use joins. In worst case we need to split some table into different shards. This will require really complex php-code, and some of the features may not even be possible to implement. Making changes to the admin-panel and adding new features will take a lot of time. Are there any good solutions how to do these complex queries with joins and sorting by count when you have partitioned the database so that some tables are on one server and others on other servers. And what about shards?

We are using Zend Framework, which comes with TableModel. Is it a good idea to use table model in our situation, or is it better to just write our queries by hand? Would some ORM framework like Doctrine work (not familiar with it).

Can we overcome these issues by using some noSQL database instead of MySQL, like MongoDB, that takes care of all the scaling? Does anyone know if complex queries, with joins and sorting by count, work well in MongoDB? Different websites seem to have different opinions. Some say MongoDB is not good for complex queries others say it is.

Can someone give some advice or point me in the right direction where I can find info about this?

Thanks,

-F

fe_81 · January 13, 2011

I forgot to say that by "partitioning" I mean vertical partitioning, i.e. splitting the database into several databases so that some tables are in one database one server and other tables in other databases, each database on an own server.

I saw that MySQL has some kind of partitioning support (horizontal, i.e. splitting one table so that some rows are in one DB and other rows in other DBs), but nowere did it become clear whether you can split so that these DBs are on different servers. Does anyone know about this?

Can someone give some advice? Thanks!

thehippy · January 14, 2011

You start by asking about general design, then get into 'what we' have type descriptions. I'm assuming you have an existing private code-base application.

For database scaling solutions, server, schema ,query and index optimization, a MySQL DBA is a very valuable and worthwhile investment. You can post a job post on the MySQL forums with what your project needs are and requirements for a DBA or DB Developer are. Let me reiterate, this is worthwhile!

You could also put your install on some expensive hardware, an iSeries server from IBM might be worthwhile, I think those start at around 14k USD. Throwing raw power at a DB is always a nice option.

There are certainly things you can do on your own to make your application more efficient. Start with monitoring and profiling, identify your bottlenecks, be it queries, network latency, slow scripts, identify memory hogs. You can garner a great deal of info this way without affecting your application. With info in hand you can make changes to caching solutions, refactoring code for efficiency, server settings and so forth.

Personally I'm not a fan of ORM solutions, they tend to slow me down and add a unnecessary layer between application and datastore. People tout the idea of platform independence, making your code more portable, but databases are as much an integral part of web applications as choosing the language you implement your application with, in my opinion at least.

noSQL should not be an option unless you understand why its a good option, not because you've heard good things about it. It has a particular use-case scenario and doesn't perform well if that's not the case.

The database (horizontal) partitioning you're asking about is to have the DB server fundamentally manage the data on disk rather than the operating system, most OSs cluster commonly opened files together, while the DBMS does the same with its data but better.

MySQL doesn't support the partitioning of data across discs itself, where data could be split across multiple hard drives to gain the read and write access speeds of multiple discs. The closest solution do this is to setup your MySQL server with RAID solution, which does increase responsiveness.

To have a database split over many servers, well the database isn't really split, replication is a common solution for database clusters, data is synchronized over many servers. There are one or many master servers that aggregate and sync data and many slave servers that deal with queries from applications. There is a bit of delay between slaves getting updated, but its usually minimal.

Shards are independent instances of your entire database, usually on different physical servers and managed centrally. This requires some very strict data integrity checking and a mechanism for syncing data if you want to centralize the data. A modern example of shards are each of the individual world servers for WoW, each server is a shard, each server has a fundamental dataset, but each shard has its own independent data as well.

fe_81 · January 14, 2011

Thank you for your answer.

So all MySQLs partitioning features only has to do with partitioning on one server, not splitting the database to many servers?

I know how replication works and we are going to use replicas for searches and similar that we cannot cache with Memcached.

We are looking for a solution where we can scale horizontally by adding more servers and not by buying better hardware. We need to come up with a relatively good application design that allows for both flexibility and scalability.

-F

Sign In

Scalability vs flexibility, any solutions?

Recommended Posts

fe_81

Link to comment

Share on other sites

fe_81

Link to comment

Share on other sites

thehippy

Link to comment

Share on other sites

fe_81

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information