Jump to content

Duplicate Data Myth?


fry2010

Recommended Posts

This is more a general question:

 

I have searched around a lot and found that there seems to be the arguement that if you have two or more tables with the same data , that the database scheme is not designed well, since there is redundant data.

 

But in my view there are some situations where it is better to catogorise the same data into seperate tables since it can reduce the time to run a select query, which I find select queries to be run more often than others. Could an expert here explain errors in my thinking here or better inform me, since I dont trust a lot of the other places I find information from.

 

If I give an example:

 

Say a new user registers. I put their data into a table called 'await_validation'. Then once they validate their account the information gets removed from that table, and placed into a different table called 'active_users'. To me this has several benifits over using a field in the 'active_users' table called 'validated'.

 

Basically I think it gives the overall benifit of:

  • Running any statement with a WHERE clause will run more efficiently, since there are less rows AND one less field to process

Link to comment
Share on other sites

Base your reasoning on facts not merely believes. Putting data in another table because one field would cost you severly in performance is nonsense.

 

Thats why I asked. I would like facts.

 

But if you have 100,000 records and you have to search those records is it really not going to impede performance?

 

Data duplication should always be an optimization, not a starting point.  Summary fields/tables are always a perfectly acceptable trade-off

 

Sorry fenway, im not understanding this... Are you saying that basically dont create duplicate data, unless it is an optimization?

 

Where could I actually learn correct ways of database structure? It seems to be allusive to me.

Link to comment
Share on other sites

Fenway said that data duplication should be something you need to consider when things get really heavy not at the start of your project. MySQL can easily handle a few million records (with proper indexing). And if you then would feel the need you can create a summary tables with optimized indexes, or partition your tables, or ..

Link to comment
Share on other sites

Fenway said that data duplication should be something you need to consider when things get really heavy not at the start of your project. MySQL can easily handle a few million records (with proper indexing). And if you then would feel the need you can create a summary tables with optimized indexes, or partition your tables, or ..

 

Brilliant, thanks for that. I suppose I am going a bit OTT thinking Im going to have thousands of people on my little crappy site.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.