July 13, 2011 at 9:20 pm
Comments posted to this topic are about the item Second Normal Form
Tom
July 14, 2011 at 1:27 am
Great explanation - however in this case I think you would agree it would be prudent to strre the Rate_Applying and Years_Applying in the Current Assignment table to prevent them being implicitly changed retrospectively when they were changed in the PayGradeDetails table. This appears to be a violation but is essential to retain correct data over time.
July 14, 2011 at 1:48 am
I must admit that I got a bit lost by the description in your first paragraph, but the example that you gave made it clear and confirmed to me that 2NF was what I thought it was - phew!
Thanks very much, looking forward to 3NF and beyond...
July 14, 2011 at 2:38 am
Lempster (7/14/2011)
I must admit that I got a bit lost by the description in your first paragraph, but the example that you gave made it clear and confirmed to me that 2NF was what I thought it was - phew!Thanks very much, looking forward to 3NF and beyond...
Thanks for that.
I should have used a betterl description (the form of words I used was pretty poor). Something like "If X is a set of prime attributes and A is a non-prime attribute and the value of A is determined by the values of X then X contains the whole of at least one candidate key". Your comment made me read that first paragraph again and I found it confusing me this time and I'm not sure what I wrote is actually right (it looks a bit too much like a definition of 3NF).
EDIT: I provided Steve with new text for the definition and it is now incorporated in teh article. Thanks' Lemperer, for helping make this a better article.
Tom
July 14, 2011 at 2:41 am
The author's point about A being "fixed by" the values of X presumably means a functional dependency of the form X->A. Functional dependencies are the basis of 2NF. It 's important to remember that the concept of functional dependency was developed without regard for nulls and it is an assumption of 2NF (and 3NF, BCNF, etc) that relations consist only of values and not nulls.
The definition of a candidate key as a set of columns that "could reasonably be used as the primary key" is a bit strange. A candidate key is quite precisely a minimal superkey. There is no difference at all between a candidate key and a primary key (or even "the" primary key) so I'm not sure what is meant by a "reasonable choice" in that context. It's a pity the explanation of candidate keys isn't spelt out more clearly because it is very often a point of confusion for some people.
July 14, 2011 at 2:49 am
martin-1049293 (7/14/2011)
Great explanation - however in this case I think you would agree it would be prudent to strre the Rate_Applying and Years_Applying in the Current Assignment table to prevent them being implicitly changed retrospectively when they were changed in the PayGradeDetails table. This appears to be a violation but is essential to retain correct data over time.
I think I would keep the rate history for each person in the PayGradeDetails table, and assignment history in a third table, if there was any question of trying to reconstruct historical charges. But the ability to do historical reconstructions is non-trivial, and doing it safely requires extra normalisation.
Of course concern for time-dependent data is what led Chris Date to invent his version of 6NF, and maybe one day there will be an SQL Server Central article on that (if there is it won't be written by me - I have no practical experience of using 6NF, so it's just theory to me, and I don't believe people should write articles on parts of normalisation that they have no practical knowledge of).
Tom
July 14, 2011 at 4:52 am
Nice article, thanks Tom.
Looking forward to the rest of the series.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
July 14, 2011 at 5:05 am
David Portas (7/14/2011)
The author's point about A being "fixed by" the values of X presumably means a functional dependency of the form X->A. Functional dependencies are the basis of 2NF.
That's sort of true. But Occam's razor is a useful tool when trying to present things in a simple manner, and what a functional dependency is all about is that some values are fixed by some other values; at least I hope that's what you mean by a functional dependency (it's what every academic paper I've read that contained the term used it for).
If I were intending to write about formal theory rather that something useful for practical database people, I would of course treat FDs as a special case of MVFDs (an FD is an MVFD where the range value set is a singleton) and then for 2NF (and 3NF too) discuss only a special case of FDs (where the range domain is a 1-tuple). I don't think that would be much use to most of the SQLServerCentral audience, who are practitioners not theorists.
It 's important to remember that the concept of functional dependency was developed without regard for nulls and it is an assumption of 2NF (and 3NF, BCNF, etc) that relations consist only of values and not nulls.
It's important to remember that Codd and Heath were discussing NULL (not two different NULLs, just one) way back when they were preparing their early papers on normalisation, so that it is extremely unlikely that NULLs weren't considered at all. They didn't work out the effect of nulls on normalisation at that point - which is why Codd later wrote that their introduction had meant that the normalisation had had to be revisited - but certainly there are no grounds now, three decades after NULLs were first proposed, for claiming that relational theory has no room for them or that 2NF, 3NF, BCNF and etc (presumably "etc" means EKNF, 4NF, various 5NFs, DKNF and Date's 6NF) exclude them (for example, if 5NF had been intended to exclude NULL, the paper introducing it would have said so - its author, Ron Fagin, was working with Codd and had just reviewed Codd's RM-T and "...Capture more Meaning" papers for him, so he certainly knew all about NULLs. For 6NF, I imagine Date did intend that NULL be forbidden - but that's the only one I believe does exclude NULL.
The definition of a candidate key as a set of columns that "could reasonably be used as the primary key" is a bit strange. A candidate key is quite precisely a minimal superkey. There is no difference at all between a candidate key and a primary key (or even "the" primary key) so I'm not sure what is meant by a "reasonable choice" in that context. It's a pity the explanation of candidate keys isn't spelt out more clearly because it is very often a point of confusion for some people.
Just a fortnight ago one of you anti-NULL purists was claiming that there was no such term as "primary key" in relational theory, only in SQL! (Complete nonsense, as you will agree.)
I think most people will understand that I used "the primary key" in the sense used in the paper which introduced the relational model to the world - it is a primary key that has been chosen to be "the" primary key as opposed to just "a" primary key and it is a perfectly sensible description of a candidate key to say that it is something that can be so chosen, both because that's why the word "candidate" is there - a candidate is something that can be chosen (whether by fate, by a schema designer, or otherwise) and since the term "candidate key" was in fact introduced as a replacement for "primary key" rather later. The concept of superkey is of course completely superfluous.
Tom
July 14, 2011 at 5:20 am
Tom,
Great article...although I too was a little lost by the first paragraph but your example made it clear.
I would also add effectiveStartDate and effectiveEndDate fields to the PayGrade_Details table so that you could see that Charlie Fixit spent 3 years at paygrade 12 and then got bumped up to paygrade 13 and when it happened. Then I would just grab the paygrade record whose start date/end date falls within the proper date range that I need...I'm not sure if that is the correct way to do it or what NF that is but that's the way I've been doing it
That way I can pull historical reports...I guess I could also use a historical or archive paygrade table and move the old records into that table when they are changed...but I'm not sure which way is the recognized "proper" way to do it.
But I digress...Thanks, George
July 14, 2011 at 7:35 am
Great Post....
July 14, 2011 at 7:55 am
Great article, didnt read any previous articles before (assuming you started at 1). It should be noted that with any production DB system, even one that seems simple, changing it to conform to DB Normalization rules will create a lot of other work like fixing queries against said DB, reports will be broken, space will increase etc...
July 14, 2011 at 9:40 am
Smendle (7/14/2011)
Great article, didnt read any previous articles before (assuming you started at 1). It should be noted that with any production DB system, even one that seems simple, changing it to conform to DB Normalization rules will create a lot of other work like fixing queries against said DB, reports will be broken, space will increase etc...
True. However to play devil's advocate:
- the report is already "broken" since it (potentially) returns bad data.
- space usage would likely DECREASE, since you're not duplicating all of the employee-dependent data.
It's definitively a non-trivial task to do this after the fact, but - there is a gain to be found in there.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
July 14, 2011 at 10:34 am
Smendle (7/14/2011)
Great article, didnt read any previous articles before (assuming you started at 1). It should be noted that with any production DB system, even one that seems simple, changing it to conform to DB Normalization rules will create a lot of other work like fixing queries against said DB, reports will be broken, space will increase etc...
That's why you should try to do all the normalisation while designing the schema, before writing code for queries and reports.
Peronally I recommend that when you have a rough schema design you list all the business rules that you would like the schema to enforce for you without requiring code to do it, express those rules as functional dependencies where you can do so, and then use Bernsteins algorithm to derive a schema which conforms to 3NF (actually it will conform to EKNF, which is better than 3NF, even though Bernstein was only targetting 3NF when he designed the algorithm). Check and make sure that all business rules are now covered by key constraints (which will involve adding NOT NULL constraints, UNIQUE constraints, and Foreign Key constraints as needed - it may also involve adding some auxiliary tables if any of the weird cases for which BCNF was intended exist: this is usually a better bet than actually going to BCNF, since in those cases BCNF can't enforce all the functional dependencies, while EKNF can if you add the auxiliary tables). Then look at any business rules which couldn't be expressed as functional dependencies: if you didn't have to add auxiliary tables to enforce FDs without going to BCNF, you can normalise to 4NF to get a schema which enforces any business rules expressed as multivalued dependencies as well as those expressed as functional dependencies; or if you are lucky you may find that the multi-valued dependencies have no impact on the bits of the schema that needed the auxiliary tables, so you can enforce the multi-valued dependencies by taking the other parts of the schema to 4NF (there is no easy way like Bernstein's algorithm to get to 4NF - or even to BCNF - so this is a bit of a slog). At worst you will end up with an EKNF schema with bits of it in 4NF and some bits not in 4NF, and there the only advice I can offer is to examine each unenforced multi-valued dependency very carefully and decide whether to enforce it (taking another part of the schema to 4NF but losing some enforcement of simple FDs) or not - either way you will need code (either a good stored procedure-based interface with no direct access from the application to tables and views, only to SPs, or a set of triggers) to enforce the business rules that the key constraints can't enforce (obviously check constraints can help sometimes, but they can't do the whole job); then look and see if any of your tables have any non-trivial join dependencies, and if so are they implied by the candidate keys or not; if there are any not implied by the candidate keys, you may want to split those tables (ie go to 5NF) - but as with the MVDs, there can be a trade-off here, since the split may lose enforcement of FDs (and you may find that the number of tables involved in joins is getting too big, potentially making the code hard to understand). If you want to have a lot of time-dependent information - lots of things have values associated with a period in time, and you want to retain all the history - you may want to elimainate all non-trivial dependencies (go to 6NF) but I can't offer any advice there, it's not within my field of competence.
Anyway, once you've done all that hard work of schema design, always being sure to allow as much flexibility as possible since the business rules may change next week (or next year, or whenever) you will find it's enormously easier to write simple and bug-free code than if you hadn't done it. If you just throw something together without any thought for normalisation you will find that you get some very complex code, and probably find it hard to maintain, hard to emhance, and discover that you need to normalise to survive - but it is now much harder that it was up front.
Tom
July 14, 2011 at 12:26 pm
Tom.Thomson (7/14/2011)
Lempster (7/14/2011)
I must admit that I got a bit lost by the description in your first paragraph, but the example that you gave made it clear and confirmed to me that 2NF was what I thought it was - phew!Thanks very much, looking forward to 3NF and beyond...
Thanks for that.
I should have used a betterl description (the form of words I used was pretty poor). Something like "If X is a set of prime attributes and A is a non-prime attribute and the value of A is determined by the values of X then X contains the whole of at least one candidate key". Your comment made me read that first paragraph again and I found it confusing me this time and I'm not sure what I wrote is actually right (it looks a bit too much like a definition of 3NF).
EDIT: I provided Steve with new text for the definition and it is now incorporated in teh article. Thanks' Lemperer, for helping make this a better article.
TOM: I have never seen 2NF explained without a VIN diagram or a Set Diagram before. Not Bad. Mere mortals might have grasped this faster with a Set diagram, but the examples and descriptions taught the reader what this is in Data Relation theory, not Set math. NICE! 😎
July 14, 2011 at 3:02 pm
Hi Tom,
Again a nice article, and looking forward to 3NF. Are you contemplating going to go any further than that?
Just one question, further above in the discussion you mention Bernstein's Algorithm. Never heard of that, and a (rather quick) search on Google didn't yield any useful results. Could you elaborate a little bit on that?
Regards,
Jan
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply