Differentiating the 3 Normal Forms (concerning 0NF, 1NF and 2NF)

j.smith1981 · November 28, 2011

I am just concerned I may be thinking about this in the wrong sense and would like someone to correct me if I should be wrong about this assumption.

0NF is pretty easy it's all attributes on the same table, that is a no brainer for myself.

The problem is is when it comes to understanding the 1NF and the 2NF.

1NF is having say a products table like so (this would then have the supplier ID):

ProductID, Product Name, Product Price, VendorID

But the 2NF would have say something along the lines of:

ProductID, VendorID

Containing no exact data about them just links to the other 2 perspective data's? am I going at this in the right sense though?

j.smith1981 · November 28, 2011

Infact no sorry 1NF is no repeating rows as a whole.

Where each column may have 1 more occurence of repeating data, but when you take into account the whole row, at least 1 attribute of that table can have 1 or more repeating rows, but as a whole they can not repeat at all as a whole of attributes.

If that makes any sense?

2NF as I have been studying is by where you have 1 set of data that can be applied to many rows, so you may have studentID repeated in 1NF 10 times, to create a FK on that relevant table, then to reference that by that 1 key to a unique row on another table, so anytime you want to refer to that studentID and change their details, you can do so without having to update many rows of student data on that.

I am currently working on 3NF though but would love to get someones reply stating if I have got this correctly in my head really.

Thank you so much for reading and I look forward to any replies,

Jeremy.

requinix · November 28, 2011

1NF is what most people talk about when they casually mention normalization. It has the most obvious rules, like no duplicate rows and only one value per cell, but it does have other rules, like how column and row ordering should not matter. Oddly, the rest of the casual definition is handled by 3NF: moving values not directly related to a table into their own tables (like keeping user information separate from a forum's thread post information).

2NF is a little harder to understand: values in a row relate to the entirety of the row's key - not just a part of it. I can't come up with a good example so I'll borrow from Wikipedia:

Say you had a table of employees, their address, and their skill. The key would be the employee and the skill, because an employee may have multiple skills and multiple employees may have the same skill. The address is tied to the employee only - it has nothing to do with their skill - so the table violates 2NF. To fix this you'd need a table containing just the employee and address.

As for what you're talking about, it sounds like you're thinking of either 1NF (no duplicate rows) or 3NF (values in a row relate directly to the key, not to other values).

If you're still unsure, do you have an example table and data?

j.smith1981 · November 30, 2011

No I don't actually have any data really what I was after was just definitions as to what is what and what level data should be.

I have another general question, probably specific to any kind of company that holds SKU's and has their own database to do this (though I doubt any database designer that's good with some amount of experience would most likely avoid for obvious reasons).

Like say I have an orders table, all the stuff like the order amount, pre tax and including tax, who they paid through and the status things like that right?

I then have a poorly designed database by where the SKU is used as a key that would presumably be in the 3rd Normal Form right? I mean it would have the other data about that product which would concern the order, obviously this is a bad idea, so you'd shift that to an order id no and make that the 2nd Normal Form wouldn't you?

Since if say I was to replace that product with another say an updated version but with the same SKU, that would cause data problems I know, but am I thinking this in the sense of the correct normal forms, since we aren't talking about the actual products table, we are merely talking about products being added to a table that have been ordered on what occasion.

We kind of have this problem with the ecommerce that I work on, one of the many reasons why I am asking this stuff to make a better database for our own purposes and want to really do this properly.

Thank you ever so much for your help so far, it's great, and I look forward to your next reply,

Jeremy.

requinix · November 30, 2011

Like say I have an orders table, all the stuff like the order amount, pre tax and including tax, who they paid through and the status things like that right?

I then have a poorly designed database by where the SKU is used as a key that would presumably be in the 3rd Normal Form right? I mean it would have the other data about that product which would concern the order, obviously this is a bad idea, so you'd shift that to an order id no and make that the 2nd Normal Form wouldn't you?

You'd have a 3NF database if the orders table only kept information directly related to the order itself. You would need at least two tables: one for the order information and one for the products in the order. That second table would not include product information. Depending on what else you're doing with the database, however, you may want to include the price of each product because it's the price at the time of the order.

orders:
* orderID (PK)
* customerID (FK)
* shippingID (FK)

order_products:
* orderID (PK)
* productID (PK)
* quantity
* price

For strict 3NF you couldn't have a subtotal or total in the order because they deal with the sum of product prices and any shipping charges. However in reality you would because doing the subqueries to get those numbers are bothersome. It's okay to sacrifice some normalization for caching, and more so if the data should never change (because once the order is complete it serves only a historical purpose).

An alternative would be to keep a product history and relate the order to an instance of the product in time.

order_products:
* orderID (PK)
* productHistoryID (PK)
* quantity

productHistory:
* productHistoryID (PK)
* productID (FK)
* startDate
* endDate
* price

Naturally it can get much more complicated than that.

Since if say I was to replace that product with another say an updated version but with the same SKU, that would cause data problems I know, but am I thinking this in the sense of the correct normal forms, since we aren't talking about the actual products table, we are merely talking about products being added to a table that have been ordered on what occasion.

There are only two good possibilities:

1. The SKU is a unique identifier. If so then it can only be used on one product ever. It cannot be reused.

2. The SKU is a simple identifier and can be reused. If so then you need a separate and unique identifier and that's what you'd be using in relationships in the database.

You could realistically design a database where the SKU is part of a composite key, and the other part is (for example) a start and end date that never overlaps on other products with the same SKU. That means you'd have to store dates everywhere you need to reference a product. Now you might actually need to do that anyways but the job is still harder than it would be if you just used a single identifier.

j.smith1981 · December 13, 2011

Sorry I haven't replied to this for a while.

But one thing I was notified about by a colleague of mine at work was each SKU like you said is unique so a product might be the same product but as the database would see it that has been used up and it's a new version of that product maybe a new cartridge with the same information so you'd get a whole new product even though it would have the same SKU, so their in effect 2 different products as the database see's it but to you and your customers its the same product, but the database won't see it like that.

There's many occasions when we do that at work, so you never in effect delete a product ever, or you'd be in effect causing data duplication when there's no need for it.

requinix · December 13, 2011

Ah, but the SKU isn't unique. There are multiple, distinct products sharing the same one: the fact that only one is available at a time doesn't matter because they all exist at once in the database.

So #2: use a typical unique identifier (eg, AUTO_INCREMENT in MySQL) to identify each product and to associate it with other entities in the database. The SKUs alone can only refer to the current version of a product; add in a date or sequence number or something else (like that unique identifier) to get a specific version of product.

j.smith1981 · December 14, 2011

Oh sorry yes that's what I meant probably didn't describe it in that manner.

So in effect you'd never set the SKU as being unique then in an SQL schema?

I have seen though on many ecommerce open source and paid for solutions that do set the SKU or a productcode as being unique though.

Just a little confused about that.

I mean one way around that would be I suppose to make your own system based prefix maybe on them or something, a compound key in effect but hmmm would that not be contradicting the rules of that level? :confused:

requinix · December 14, 2011

So in effect you'd never set the SKU as being unique then in an SQL schema?

Not by itself. If you can pair the SKU with something else and have the pair together be unique, you can put a UNIQUE key on the two. Otherwise just use an INDEX.

I have seen though on many ecommerce open source and paid for solutions that do set the SKU or a productcode as being unique though.

Just a little confused about that.

Then those solutions don't allow for the SKU/code to be reused across multiple products. Or they might define a "product" to be a combination of things. But if the SKU alone isn't enough to identify a product then it alone can't be unique.

I mean one way around that would be I suppose to make your own system based prefix maybe on them or something, a compound key in effect but hmmm would that not be contradicting the rules of that level?

Yes, a compound key is exactly what I'm talking about. For example, if you had a sequence number then there could be a UNIQUE(SKU, Sequence) while a UNIQUE(SKU) wouldn't be enough.

Prefixes generally aren't good ideas because they end up being difficult to manage: you're combining two bits of information into one cell. In most cases if you're considering adding a prefix to data that didn't already have one then you're violating normal form.

j.smith1981 · December 15, 2011

Ahh that's really good I will definately use your guide as a way to go about doing Normalisation.

Thank you so much!

Jeremy.

Sign In

Differentiating the 3 Normal Forms (concerning 0NF, 1NF and 2NF)

Recommended Posts

j.smith1981

Link to comment

Share on other sites

j.smith1981

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

j.smith1981

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

j.smith1981

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

j.smith1981

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

j.smith1981

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information