Use of 'index' tables

mayfair · April 6, 2010

Hi All,

Im in the process of designing/building a new database with a colleague that we're having some minor issues with. Rather than try and explain it, its probably easier if I just give an example so here goes:

We have a 2 or 3 tables in the database that require a 'county' field as part of an address. Obviously there is only a finite number of possible counties, but my colleague has suggested that we create a new table (called counties) that contains 2 columns: county_id which is an auto-increment PK and county_name which is a varchar that contains the name of the county. His logic is that because 'county' is repeated in a couple of tables throughout the database, it would be better to store counties in its own table and reference just the county_id as and when we need to refer to it. I see were he's coming from, but I just don't really like the feel of it! More than anything, it makes the tables a bit confusing in their raw format because its just bunches of numbers which don't mean a lot without the reference table.

County is the example I used above, however we have the same scenario with about 5 other fields in our database. My question is this: is it really that much more efficient or beneficial to store these fields in their own seperate table or are we over-compliacting things? Ideally, I would like to see 'county' stored as a varchar the same as address1, address2, town etc. is. The database will eventually have about 300k records in it and will grow by 100-150k each year so performance does have to be taken into consideration.

Thanks

ignace · April 6, 2010

These questions should not be answered by us but by your business requirements. If you for example want to list these counties it is more beneficial when you query a table with ~100 records and just:

SELECT * FROM county

Then that it is when you would have to query ~300k rows like:

SELECT DISTINCT county FROM users

So like I said these questions can not be answered by us as we don't know anything about your domain and we can only make best guesses as to what your approach should be. Remember: a good planning has never hurt anyone.

mayfair · April 6, 2010

Thanks for your reply ignance.

I appreciate the answer depends on how the database will eventually be used, I just wanted to make sure there wasn't a definiate "yes, always do that!" answer because of some obvious reason I had missed. For the record, we will not be needing to query the database just to display a list of counties; they form a required part of the address system that will be pulled out on an individual customer basis as and when it's needed.

This project is not something we have been asked to do, it's just something we're looking at in our spare time. Fortunately this means we have all the time in the world to plan it, I just want to make sure we do it right from step one

Mchl · April 6, 2010

Actually, if you want to comply with data normalisation rules, then the answer is 'yes, always do that'.

The table might look a bit more confusing as you say in it's 'raw' form, but you gain other benefits instead:

1. It's easier to cater for countries changing names (whether that's because a country actually changes name, or you just made a spelling mistake once, and discover it half a year after that).

2. Tables take less space both in disk and in memory, which is always a good thing.

mayfair · April 6, 2010

1. It's easier to cater for countries changing names (whether that's because a country actually changes name, or you just made a spelling mistake once, and discover it half a year after that).

2. Tables take less space both in disk and in memory, which is always a good thing.

I did wonder if that was the case, but in the event of a spelling mistake, surely a global find and replace wouldn't be too difficult to pull off? In terms of disk space, would a completely new table with 2 columns really be much more smaller than storing say a VARCHAR(50)?

Mchl · April 6, 2010

1. It's easier to cater for countries changing names (whether that's because a country actually changes name, or you just made a spelling mistake once, and discover it half a year after that).

2. Tables take less space both in disk and in memory, which is always a good thing.

I did wonder if that was the case, but in the event of a spelling mistake, surely a global find and replace wouldn't be too difficult to pull off? In terms of disk space, would a completely new table with 2 columns really be much more smaller than storing say a VARCHAR(50)?

Thing is, if you store country name once, you only need to change it in one place, without having to worry, that you forget to change it somewhere. It is easy, when you have three tables in your db, but will get complicated when you have 30.

As far as VARCHAR vs (SMALL)INT is concerned. The smaller column size, the smaller, and thus faster is the index created on this column (and you will most likely want to have indexes on foreign key columns). Again, this is something you will not notice on small database, but that can make a huge difference when dealing with large amounts of data.

Sign In

Use of 'index' tables

Recommended Posts

mayfair

Link to comment

Share on other sites

ignace

Link to comment

Share on other sites

mayfair

Link to comment

Share on other sites

Mchl

Link to comment

Share on other sites

mayfair

Link to comment

Share on other sites

Mchl

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information