November 22, 2017 at 4:42 pm
Lynn Pettis - Wednesday, November 22, 2017 1:37 PMWas I taught wrong regarding normalization? I was taught that you normalize to a point, say 3NF, then de-normalize where appropriating knowing the trade-offs and possible issues that you would need to account for in the design.
That's practical work, but it's then backing off from normalization. It's what you should do, but when you move away from normalization, you're not normalizing.
November 22, 2017 at 4:43 pm
Lynn Pettis - Wednesday, November 22, 2017 3:34 PMJust some former MVP espousing normalization and his never wills in the same breath.
I'd ignore that thread. It's silly.
November 22, 2017 at 4:58 pm
Lynn Pettis - Wednesday, November 22, 2017 3:14 PMHugo Kornelis - Wednesday, November 22, 2017 3:06 PMLynn Pettis - Wednesday, November 22, 2017 1:37 PMWas I taught wrong regarding normalization? I was taught that you normalize to a point, say 3NF, then de-normalize where appropriating knowing the trade-offs and possible issues that you would need to account for in the design.Correct. Well, except that you should normalize up to 5NF before you start denormalization.
Many people in the world appear to confuse "de-normalized" with "non-normalized"Well, I was using 3NF as an example. I would also think it would be dependent on the complexity of the data being modeled as well. Not sure I would go to 5NF for a simple database system. Thinking about it, I really think it depends on the data anomalies you encounter during the process. If I remember correct the order of normalization is 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, 6NF. After 3NF I have a hard time remembering what data anomalies you are overcoming. I would actually have to look those up.
You should always check for 4NF and 5NF. If the data model is not too complex, there will usually be no violations once you're already in 3NF / BCNF. But if there are, they are easy to fix and repair.
For your memory: 4NF and 5NF are about what is officially called "multivalued dependencies" and "join dependencies". In plain English, you only get violations of these normal forms if you have a single candidate key on three (or more) columns in the same table; this design is correct if it represents a single fact about the three attributes; it violates 4NF or 5NF if it is actualy a combination of two or three independent facts about those attributes.
Example: "Hugo likes to sit on the couch in the living room" - a table with Person, Furniture and Room, with key on the three, would be correct if that is the only fact to be represented (I don't like to sit on the chair in the living room, only the couch; or I really prefer the chair over the couch when in the guestroom). However, if that fact is simply a conclusion from "Hugo likes to sit on a couch" (no matter where), and "Hugo likes to sit in the living room" (on any available furniture), then the single table violates 4NF and should be split over 2 tables. And then, if you find that the fact "there is a couch in the living room" is relevant as well (take it away and I can't sit there anymore), then the 2-table design would be wrong; the 1-table design would violate 5NF, and you'd have to go for 3 tables instead.
Other normal forms are far less important, in my opinion. 6NF tries to get rid of NULLs in tables by having thousands of tables (basically, where you would have a table with a single-column key and nine non-key columns, 6NF would create nine 2-column tables). Only potentially useful if you manage to find a DBMS that is optimized to handle this kind of data distribution, and then you'll also need to use a querying language other than SQL, to prevent NULLs from coming right back in as soon as you start querying the data, ONF, which as far as I know is not academically founded, is very similar to 6NF but takes a small step back in a few cases where 6NF would lose the ability to enforce a composite alternate key or a composite foreign key. And DK/NF, finally, is an interesting thought experiment but very easily shown to be unobtainable in very common and simple use cases. A real-worl requirement as simple as "no more than 30 students per class" is sufficient to break DK/NF.
... okay, I'll shut up now
November 22, 2017 at 4:59 pm
Steve Jones - SSC Editor - Wednesday, November 22, 2017 4:43 PMLynn Pettis - Wednesday, November 22, 2017 3:34 PMJust some former MVP espousing normalization and his never wills in the same breath.
I'd ignore that thread. It's silly.
... yeah, so now you've made me curious. Link, for those who don't want to heed your advice?
November 22, 2017 at 5:14 pm
Hugo Kornelis - Wednesday, November 22, 2017 4:59 PM... yeah, so now you've made me curious. Link, for those who don't want to heed your advice?
November 22, 2017 at 6:37 pm
Hugo Kornelis - Wednesday, November 22, 2017 4:37 PMI would not put it that sttrongly. Sure, BCNF is not always achievable. But very often, it is; so why remove it from a list? Also, though there are cases where BCNF is not achievable yet EKNF is *and* is different from 3NF, they are rare edge cases; I say that EKNF is more of interest to academics than in the real world. Wen BCNF is achievable, use it to remove some potential anomalies from your schema. When BCNF is not achievable, then document the anomalies and use code to prevent them because the database can't; calling the schema EKNF is nice and all but does not fix the anomalies.
(Also, if you kick out BCNF you are effectively kicking out all the higher forms, since they all imply BCNF. Yet if my database has one or two tables where BCNF is not possible, I should still look for potential 4NF and 5NF violations).
Evidently you are unaware that EKNF can enforce functional dependency sets that BCNF can't.
And you seem to be disregarding the complexity of discovering whether BCNF is achievable for a particular functional dependency set, and the complexity of creating a BCNF shema from a dependency set. In the extremely rare case that there are functional dependencies that can't be handled by EKNF, it may be sensible to look at BCNF but many such cases also preclude BCNF representation so it's not at all certain to get you anywhere (and certainly doesn't in the cases where going BCNF implies losing the ability to reprsent something that can be correctly represented in EKNF).
I would be very surprised if there were any significant number of real world practical cases that BCNF can handle but ECNF can't - that's the miniscule set of edge cases, not the set which EKNF can handle while BCNF can't.
Eliminating multivalued dependencies doesn't eliminate single valued functional dependencies (unless "multivalued" means "single valued", as is sometimes neccessary for those who insist that "multiple-valued dependency" being a special case of "join dependency" means that mutivalued dependencies include dependencies which are single valued functional dependencies) so 4NF doesnt imply any of the lower NFs unless you are either misusing the English word "multiple" or using a definition of 4NF which expressly asks for them. So your claim that all higher levels imply BCNF is rather weak, since 4NF is a higher level.
But 5NF does indeed imply BCNF, I think.
Tom
November 22, 2017 at 7:47 pm
Hugo Kornelis - Wednesday, November 22, 2017 4:58 PMYou should always check for 4NF and 5NF. If the data model is not too complex, there will usually be no violations once you're already in 3NF / BCNF. But if there are, they are easy to fix and repair.For your memory: 4NF and 5NF are about what is officially called "multivalued dependencies" and "join dependencies". In plain English, you only get violations of these normal forms if you have a single candidate key on three (or more) columns in the same table; this design is correct if it represents a single fact about the three attributes; it violates 4NF or 5NF if it is actualy a combination of two or three independent facts about those attributes.
Example: "Hugo likes to sit on the couch in the living room" - a table with Person, Furniture and Room, with key on the three, would be correct if that is the only fact to be represented (I don't like to sit on the chair in the living room, only the couch; or I really prefer the chair over the couch when in the guestroom). However, if that fact is simply a conclusion from "Hugo likes to sit on a couch" (no matter where), and "Hugo likes to sit in the living room" (on any available furniture), then the single table violates 4NF and should be split over 2 tables. And then, if you find that the fact "there is a couch in the living room" is relevant as well (take it away and I can't sit there anymore), then the 2-table design would be wrong; the 1-table design would violate 5NF, and you'd have to go for 3 tables instead.Other normal forms are far less important, in my opinion. 6NF tries to get rid of NULLs in tables by having thousands of tables (basically, where you would have a table with a single-column key and nine non-key columns, 6NF would create nine 2-column tables). Only potentially useful if you manage to find a DBMS that is optimized to handle this kind of data distribution, and then you'll also need to use a querying language other than SQL, to prevent NULLs from coming right back in as soon as you start querying the data, ONF, which as far as I know is not academically founded, is very similar to 6NF but takes a small step back in a few cases where 6NF would lose the ability to enforce a composite alternate key or a composite foreign key. And DK/NF, finally, is an interesting thought experiment but very easily shown to be unobtainable in very common and simple use cases. A real-world requirement as simple as "no more than 30 students per class" is sufficient to break DK/NF.
I agree that checking for 4NF and 5NF violations is always a good idea, but I do that starting from EKNF rather than BCNF. If there are no 5NF violations, there aren't any BCNF violations either (since 5NF implies BCNF) so why go to the pain of looking at BCNF?
I like your nice simple examples of 4NF and 5NF issues - if I were still training youngsters I might use them.
Last time I looked, there was no concensus at all on what 6NF was, there were several candidates. I can only remember 2 of them. Fagin's DKNF looked over-ambitious - new meanings for "domain" and "key" would be needed before DKNF could be a widely applicable form, and the result might or might not be relational. Date's definition of 6NF seemed to slip back and forth between ideas to improve handling of time and of history and wanting to eliminate all non-trivial join dependencies; it was suggested that this could save space, but it would certainly match up with your description of getting 9 2-column tables to replace 1 10-column table and guarantee that there would be no need for NULLs in the base tables, so maybe it's Date's 6NF that you describe. I doubt that ONF can eliminate enough of the silliness to be usable, but according to Date there would be no problem with outer joins generating NULLs, they could just generate a default values instead.
Tom
November 23, 2017 at 1:46 am
Either my browser had gone down the drain, or that was some mighty spam this morning. The entirety of Page 1 on the Latest View was filled up O.o. F5'd after making a coffee and they were all gone though. 😀
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
November 23, 2017 at 1:49 am
Lynn Pettis - Wednesday, November 22, 2017 1:37 PMWas I taught wrong regarding normalization? I was taught that you normalize to a point, say 3NF, then de-normalize where appropriating knowing the trade-offs and possible issues that you would need to account for in the design.
"Normalise until it hurts; denormalise until it works"
Thomas Rushton
blog: https://thelonedba.wordpress.com
November 23, 2017 at 1:50 am
Thom A - Thursday, November 23, 2017 1:46 AMEither my browser had gone up down the drain, or that was some mighty spam this morning. The entirety of Page 1 on the Latest View was filled up O.o. F5'd after making a coffee and they were all gone though. 😀
That explains why I've not been seeing much spam here in the mornings - some other spambuster is getting in just before me!
Thomas Rushton
blog: https://thelonedba.wordpress.com
November 23, 2017 at 2:35 am
Steve Jones - SSC Editor - Wednesday, November 22, 2017 5:14 PMHugo Kornelis - Wednesday, November 22, 2017 4:59 PM... yeah, so now you've made me curious. Link, for those who don't want to heed your advice?
Thanks!
Interesting read. In this case, I find myself actually partly in agreement with Scott, except that he uses the wrong terminology. Splitting the date into its part would not be denormalization in this specific case, it would be fixing what might be a 1NF violation. But note that 1NF is probably the least well-defined of all normal forms. Not the "every table needs a key" part, that is pretty clear, but the "every value must be atomic". So what is atomic? The answer here is: "it depends". On what? On the UoD, short for Universe of Discourse - the data rules as relevant to the whoever pays you.
A prime example is a person's address. Most people go for address line 1, address line 2, postal code, city, and country. (And then run into issues when their cusiness expands to the UK witht heir weird addresses). When asked why, the 99% answer is "because everybody does it", which basically means they do it that way because that was the best solution for systems that were optimized for magnetic tape. When asked if this design violates 1NF, the only correct answer is "I don't know, it depends on what the business actually does with those addresses". If, for instance, the business needs to plan mail delivery routes, then it makes sense to order mail by street, then first the odd house numbers ascending and then the even house numbers descending , so the mailman can just walk up and down the street while delivering mail and has to cross just a single time (at least in the Netherlands, where usually odd numbers and even numbers are on different sides of the road). So here, address line 1 would be a violation of 1NF; the street name and house numbers are relevant as seperate attributes and they need to be stored separately. But that same rule does not apply everywhere. Most businesses I worked for do just one thing with the address: print it on a label and slap it on an envelope. They never do anything with the individual elements of the address, except the country (for legal reasons: tax rules, export documents, etc). So one could argue that the "standard" design is already split too far (I like to call this subatomic, since attributes that were already atomic were still split) and one could simply combine all address lines, the postal code, and the city into a single "address block" column (assuming the DBMS allows line breaks to be stored in the data). While subatomic designs are not technically a violation of 1NF, they are often not considered correct, and are definitely often impractical.
In the case of the thread above, I think it's fair to say that, based on the question, the logical data model should have separate attributes for at least the MMDD part and the YYYY part of the birthdate; and I would then just split it into all three parts and model BirthYear, BirthMonth, and BirthDay. For the implementation, I might then perhaps decided to de-normalize this back to a single BirthDate column and create computed columns for the components; or I might decide to keep the individual components and add BirthDate itself as a computed column.
(Deliberatly not putting this discussion in that thread; it would not help the OP)
November 23, 2017 at 3:33 am
Hugo Kornelis - Wednesday, November 22, 2017 4:37 PMI would not put it that sttrongly. Sure, BCNF is not always achievable. But very often, it is; so why remove it from a list? Also, though there are cases where BCNF is not achievable yet EKNF is *and* is different from 3NF, they are rare edge cases; I say that EKNF is more of interest to academics than in the real world. Wen BCNF is achievable, use it to remove some potential anomalies from your schema. When BCNF is not achievable, then document the anomalies and use code to prevent them because the database can't; calling the schema EKNF is nice and all but does not fix the anomalies.
(Also, if you kick out BCNF you are effectively kicking out all the higher forms, since they all imply BCNF. Yet if my database has one or two tables where BCNF is not possible, I should still look for potential 4NF and 5NF violations).
Evidently you are unaware that EKNF can enforce functional dependency sets that BCNF can't.
No, I am not. Please read what I wrote. I explicitly mention these cases and call them "rare" and "edge cases". This is based on my observations that (a) I have never seen them yet in the real world, and (b) every time I saw examples in instruction material, the requirements appear weird, as if someone had to forcefully bend reality to make a point.
Can you provide me with a realistic and common example of a case where (a) EKNF is not identical to 3NF, and BCNF is unachievable?
I also said that, in cases where BCNF is not achievable, creating a non-BCNF schema and calling it EKNF does not fix your issues, You still have a business rule that the schema can't enforce.
That's why I always shoot for BCNF (or rather, 5NF which implies BCNF). And if I run into a situation where BCNF is impossible, then you can call the result whatever four-letter combination you want, I will be busy ensuring that the business rules the schema cannot enforce are properly documented and enforced elsewhere.
I would be very surprised if there were any significant number of real world practical cases that BCNF can handle but ECNF can't - that's the miniscule set of edge cases, not the set which EKNF can handle while BCNF can't.
There are probably zero cases that BCNF can handle and EKNF can't, since BCNF implies EKNF. When BCNF is achievable,, it is *also* EKNF (and 3NF, 2NF, and 1NF). If BCNF cannot be achieved, then there is a combination of functional dependencies that cannot be represented in a pure relational model without either having a key attribute dependent on part of a key, or introducing redundancy. EKNF "fixes" this by saying "oh but it's an elementary key column, that restriction only applies to non-elementary key columns, that is okay". It does not fix the actual problem, which is that the schema allows data that violates the functional dependencies.
Eliminating multivalued dependencies doesn't eliminate single valued functional dependencies (unless "multivalued" means "single valued", as is sometimes neccessary for those who insist that "multiple-valued dependency" being a special case of "join dependency" means that mutivalued dependencies include dependencies which are single valued functional dependencies) so 4NF doesnt imply any of the lower NFs unless you are either misusing the English word "multiple" or using a definition of 4NF which expressly asks for them. So your claim that all higher levels imply BCNF is rather weak, since 4NF is a higher level.
Yes, a functional dependency IS a special case of a multi-valued dependency. This is pretty well-documented. If you think that everyone got it wrong in the past, then you are in luck because disproving a theory is much easier than proving it; all it takes is one single example of a functional dependency that is not a multi-valued dependency and I will never again call a functional dependency a special case of a multi-valued dependency, (Plus, you will be making history)
However, that is not the reason why I say that all higher forms imply BCNF. I say that because it is explicitly included in the definition of 4NF: "A relation is in 4NF if (a) it is in BCNF; and (b) it has no non-trivial multivalued depedencies".
TomThomson - Wednesday, November 22, 2017 7:47 PMHugo Kornelis - Wednesday, November 22, 2017 4:58 PMYou should always check for 4NF and 5NF. If the data model is not too complex, there will usually be no violations once you're already in 3NF / BCNF. But if there are, they are easy to fix and repair.For your memory: 4NF and 5NF are about what is officially called "multivalued dependencies" and "join dependencies". In plain English, you only get violations of these normal forms if you have a single candidate key on three (or more) columns in the same table; this design is correct if it represents a single fact about the three attributes; it violates 4NF or 5NF if it is actualy a combination of two or three independent facts about those attributes.
Example: "Hugo likes to sit on the couch in the living room" - a table with Person, Furniture and Room, with key on the three, would be correct if that is the only fact to be represented (I don't like to sit on the chair in the living room, only the couch; or I really prefer the chair over the couch when in the guestroom). However, if that fact is simply a conclusion from "Hugo likes to sit on a couch" (no matter where), and "Hugo likes to sit in the living room" (on any available furniture), then the single table violates 4NF and should be split over 2 tables. And then, if you find that the fact "there is a couch in the living room" is relevant as well (take it away and I can't sit there anymore), then the 2-table design would be wrong; the 1-table design would violate 5NF, and you'd have to go for 3 tables instead.Other normal forms are far less important, in my opinion. 6NF tries to get rid of NULLs in tables by having thousands of tables (basically, where you would have a table with a single-column key and nine non-key columns, 6NF would create nine 2-column tables). Only potentially useful if you manage to find a DBMS that is optimized to handle this kind of data distribution, and then you'll also need to use a querying language other than SQL, to prevent NULLs from coming right back in as soon as you start querying the data, ONF, which as far as I know is not academically founded, is very similar to 6NF but takes a small step back in a few cases where 6NF would lose the ability to enforce a composite alternate key or a composite foreign key. And DK/NF, finally, is an interesting thought experiment but very easily shown to be unobtainable in very common and simple use cases. A real-world requirement as simple as "no more than 30 students per class" is sufficient to break DK/NF.
I agree that checking for 4NF and 5NF violations is always a good idea, but I do that starting from EKNF rather than BCNF. If there are no 5NF violations, there aren't any BCNF violations either (since 5NF implies BCNF) so why go to the pain of looking at BCNF?
I use a very different (and, unfortunately, not very well-known) apporach to data modelling, called NIAM. It starts with representing the business' data as collections of facts, then goes through a series of formal steps to generate a conceptual model based on those fact collections (called "fact types") that can then through a series of exacly-prescribed steps (and with exactly prescribed, I mean you can program them - I know because I did) be transformed directly into a relational model that is completely 5NF - well, except of course if the data models a UoD where BCNF is impossible, but in that case it would be "EKNF + no non-trivial multivalued dependencies + no join dependencies not implied by a key", and the algorithm would construct a "special constraint" (i.e. documentation for the app developers because it's not enforcable by the DB) for the business rule that makes BCNF impossible.
So I never actively check for BCNF, I just create a schema that is in BCNF when possible,
Last time I looked, there was no concensus at all on what 6NF was, there were several candidates. I can only remember 2 of them. Fagin's DKNF looked over-ambitious - new meanings for "domain" and "key" would be needed before DKNF could be a widely applicable form, and the result might or might not be relational. Date's definition of 6NF seemed to slip back and forth between ideas to improve handling of time and of history and wanting to eliminate all non-trivial join dependencies; it was suggested that this could save space, but it would certainly match up with your description of getting 9 2-column tables to replace 1 10-column table and guarantee that there would be no need for NULLs in the base tables, so maybe it's Date's 6NF that you describe. I doubt that ONF can eliminate enough of the silliness to be usable, but according to Date there would be no problem with outer joins generating NULLs, they could just generate a default values instead.
Based on the revision history of the 6NF page on wikipedia, there does not apppear to be much dispute anymore. DK/NF was by some authors named 6NF before Date's work, and not by Fagin himself. Date's definition of 6NF is, as far as I know, pretty clear: whereas 5NF eliminates join dependencies not implied by a key, 6NF eliminates all join dependencies.
I never said that ONF eliminates the silliness. It just repairs the issues where 6NF can no longer enforce constraints related to composite keys. (E.g. a "Teachers" table with candidate key {FirstName, LastName}, and a "Subjects" table with candidate key {SubjectName} and FDs {SubjectName} --> {FirstName, LastName} and {FirstName, LastName} --> {SubjectName}. In 6NF, the FDs would create two tables (SubjectName FirstName) and (SubjectName, LastName), and you can no longer enforce the alternate key on {FirstName, LastName}, nor the referential integrity that the combination {FirstName, LastName} has to exist in the Teachers table. ONF would allow to keep these two columns together even though that violates a join dependency, but still separate all other columns out to separate tables. Yea for silliness.
On Date's claim that outer joins could just generate default values instead, I never heard that one; and I would be interested to know what default values he proposes and how the application logic then distinguishes between that value coming from the actual data or that same value representing a missing value.
I've said it before (on my blog), though not as explicit and blunt as I am going to do here: Date's position on NULL values is wrong, and many of his arguments are based on fallacies. His argument that you can join 6NF tables together without introducing NULL values because you can instead use a default value is a fallacy.
November 23, 2017 at 11:50 am
ThomasRushton - Thursday, November 23, 2017 1:49 AMLynn Pettis - Wednesday, November 22, 2017 1:37 PMWas I taught wrong regarding normalization? I was taught that you normalize to a point, say 3NF, then de-normalize where appropriating knowing the trade-offs and possible issues that you would need to account for in the design."Normalise until it hurts; denormalise until it works"
My take on it is "Normalize 'til it works". 😉 The only time I denormalize is for temporary or semi-permanent reporting tables and I usually try to avoid that. Of course, there are exceptions to every rule. Materializing a "month" using a persisted computed column for the sake of SARGability in reporting queries, for example. Guess that's where the "denormalize until it works" comes into play. Not sure such a thing qualifies as a "denormalization" though.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 23, 2017 at 11:54 am
Lynn Pettis - Wednesday, November 22, 2017 3:34 PMThom A - Wednesday, November 22, 2017 3:20 PMLynn Pettis - Wednesday, November 22, 2017 3:14 PMHugo Kornelis - Wednesday, November 22, 2017 3:06 PMLynn Pettis - Wednesday, November 22, 2017 1:37 PMWas I taught wrong regarding normalization? I was taught that you normalize to a point, say 3NF, then de-normalize where appropriating knowing the trade-offs and possible issues that you would need to account for in the design.Correct. Well, except that you should normalize up to 5NF before you start denormalization.
Many people in the world appear to confuse "de-normalized" with "non-normalized"Well, I was using 3NF as an example. I would also think it would be dependent on the complexity of the data being modeled as well. Not sure I would go to 5NF for a simple database system. Thinking about it, I really think it depends on the data anomalies you encounter during the process. If I remember correct the order of normalization is 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, 6NF. After 3NF I have a hard time remembering what data anomalies you are overcoming. I would actually have to look those up.
What brought on the sudden self doubt Lynn?
Just some former MVP espousing normalization and his never wills in the same breath.
Heh... since I'm no longer an MVP, I hope like hell that you're not talking about me! 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
November 24, 2017 at 11:21 am
ThomasRushton - Thursday, November 23, 2017 1:49 AMLynn Pettis - Wednesday, November 22, 2017 1:37 PMWas I taught wrong regarding normalization? I was taught that you normalize to a point, say 3NF, then de-normalize where appropriating knowing the trade-offs and possible issues that you would need to account for in the design."Normalise until it hurts; denormalise until it works"
Probably it's better to normalise unitil it works, and then denormalise and write the code needed to compensate for the denormalisation (and write it in the database, perhaps in the form as triggers, to validate that all inserts, deletions, and updates really conform t the business rules, as well as making the app to attempt to do the right thing so that the database code will never see an error) until the benefit of further denormalisation ceases to outweigh the pain of writing all that extra code. Actually, where possible denormalise by adding computed columns so that no extra code is needed to prevent the denormalisation allowing the app to wreck the data.
Tom
Viewing 15 posts - 60,481 through 60,495 (of 66,712 total)
You must be logged in to reply to this topic. Login to reply