Duplicate Data Myth?

fry2010 · February 27, 2011

This is more a general question:

I have searched around a lot and found that there seems to be the arguement that if you have two or more tables with the same data , that the database scheme is not designed well, since there is redundant data.

But in my view there are some situations where it is better to catogorise the same data into seperate tables since it can reduce the time to run a select query, which I find select queries to be run more often than others. Could an expert here explain errors in my thinking here or better inform me, since I dont trust a lot of the other places I find information from.

If I give an example:

Say a new user registers. I put their data into a table called 'await_validation'. Then once they validate their account the information gets removed from that table, and placed into a different table called 'active_users'. To me this has several benifits over using a field in the 'active_users' table called 'validated'.

Basically I think it gives the overall benifit of:

Running any statement with a WHERE clause will run more efficiently, since there are less rows AND one less field to process

ignace · February 27, 2011

Base your reasoning on facts not merely believes. Putting data in another table because one field would cost you severly in performance is nonsense.

fenway · February 27, 2011

Data duplication should always be an optimization, not a starting point. Summary fields/tables are always a perfectly acceptable trade-off.

fry2010 · February 28, 2011

Base your reasoning on facts not merely believes. Putting data in another table because one field would cost you severly in performance is nonsense.

Thats why I asked. I would like facts.

But if you have 100,000 records and you have to search those records is it really not going to impede performance?

Data duplication should always be an optimization, not a starting point. Summary fields/tables are always a perfectly acceptable trade-off

Sorry fenway, im not understanding this... Are you saying that basically dont create duplicate data, unless it is an optimization?

Where could I actually learn correct ways of database structure? It seems to be allusive to me.

ignace · February 28, 2011

Fenway said that data duplication should be something you need to consider when things get really heavy not at the start of your project. MySQL can easily handle a few million records (with proper indexing). And if you then would feel the need you can create a summary tables with optimized indexes, or partition your tables, or ..

fry2010 · February 28, 2011

Fenway said that data duplication should be something you need to consider when things get really heavy not at the start of your project. MySQL can easily handle a few million records (with proper indexing). And if you then would feel the need you can create a summary tables with optimized indexes, or partition your tables, or ..

Brilliant, thanks for that. I suppose I am going a bit OTT thinking Im going to have thousands of people on my little crappy site.

ignace · February 28, 2011

I'm sure those who started Facebook never guessed they would invite the whole world in on their crappy site.

fenway · March 1, 2011

I'm sure those who started Facebook never guessed they would invite the whole world in on their crappy site.

Maybe not, but they didn't optimize it until the day AFTER they did.

Sign In

Duplicate Data Myth?

Recommended Posts

fry2010

Link to comment

Share on other sites

ignace

Link to comment

Share on other sites

fenway

Link to comment

Share on other sites

fry2010

Link to comment

Share on other sites

ignace

Link to comment

Share on other sites

fry2010

Link to comment

Share on other sites

ignace

Link to comment

Share on other sites

fenway

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information