July 5, 2011 at 7:23 am
David Portas (7/5/2011)
...But it's the same relation variable as before! The atomicity of the string and the structure of the relation has not changed. Just because I intend to use it differently T cannot suddenly stop being a relation...
Is it just me, or does this sound like a quantum theory of normal forms? :laugh:
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 5, 2011 at 9:03 am
SQLkiwi (7/5/2011)
David Portas (7/5/2011)
...But it's the same relation variable as before! The atomicity of the string and the structure of the relation has not changed. Just because I intend to use it differently T cannot suddenly stop being a relation...Is it just me, or does this sound like a quantum theory of normal forms? :laugh:
To me It reads like two or more people are re-writing/debating the original RDBMS theory material. Trying to fit things Chris Date or Ted Codd wrote about to there own personal thoughts or ideas. I am not a big fan. People come here to learn. They should not be given the personal thoughts abouts someone elses ideas as a fact. That is misinformation.
An actual Quote from Chris Date: "There was a general feeling at the time that research belonged to everyone. It was public. People at IBM didn’t like it, but all the papers were available. Ellison felt it was obvious that Codd was onto something, so he built a clone of what Ted outlined"
If you want to keep writing about what you "think" Codd & Date meant please read some of the things they wrote yourself before posting. Try to stay away from what Oracle or IBM re-wrote later. Post some live links to that factual information along with your thoughts.
Since we have several examples in this thread of how not to do that. Here are some examples of how to do it:
For anyone that want's to read what Ted Codd wrote about NULL (SQL):
http://en.wikipedia.org/wiki/Null_(SQL) I know it is wikipedia, but so what if it is correct.
For anyone that want's to read what Codd & Date wrote about 1NF:
http://www2.yk.psu.edu/~lxn/IST_210/normal_form_definitions.html
FYI: The main definition for 1NF in the link above has the word DOMAIN in it. This is how Ted and later Chris both saw it.
FYI: Chris Date did not work with Ted Codd on the original definition of NULL. Chris came into the picture later.
It is the RDBMS Ternary Value theory Codd introduced that most people seem to confuse the most. Codd never said NULL = Unknown.
The Unknown truth value is generated whenever Null is compared with any data value, or with another Null.
Codd saw the need for differant levels of NULL in a RDMS to store and return the lack of a value properly. In 1990 Codd wrote in The Relational Model for Database Management version 2
"The single Null mandated by the SQL standard is inadequate, and should be replaced by two separate Null-type markers to indicate the reason why data is missing. These two Null-type markers are commonly referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively."
NULL, just like the mostly empty void we call the universe, is a large subject.:cool:
July 5, 2011 at 4:23 pm
SQLkiwi (7/4/2011)
James Goodwin (7/1/2011)
It would be nice if there was a more clean SQL syntax than (X is NULL OR X <> 'Value'), but that is implementation, not theory.There is an alternative to that, which I use quite a lot:
Nice blog post, Paul. Never thought of trying that route. One more trick in my bag, thanks!
July 5, 2011 at 5:44 pm
SanDroid (7/5/2011)
To me It reads like two or more people are re-writing/debating the original RDBMS theory material. Trying to fit things Chris Date or Ted Codd wrote about to there own personal thoughts or ideas. I am not a big fan. People come here to learn. They should not be given the personal thoughts abouts someone elses ideas as a fact. That is misinformation.
I agree! SanDroid for President! 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
July 7, 2011 at 6:07 am
SanDroid (7/5/2011)
For anyone that want's to read what Ted Codd wrote about NULL (SQL):http://en.wikipedia.org/wiki/Null_(SQL) I know it is wikipedia, but so what if it is correct.
It's a pretty good reference. Some trivial errors (for example the two distinct null values were introduced more than a decade before the RM V2 book, not in it.). Most importantly it points out that NULL indicates the absence of a value for the datum, not the truth value "unknown".
For anyone that want's to read what Codd & Date wrote about 1NF:
http://www2.yk.psu.edu/~lxn/IST_210/normal_form_definitions.html
It's certainly not what Codd wrote about normal forms. It may be something like some of what CJD write (I haven't read all his writings), but referring everything to THE primary key instead of to the set of primary keys (usually now called candidate keys) is not at all Codd's style.
It is the RDBMS Ternary Value theory Codd introduced that most people seem to confuse the most. Codd never said NULL = Unknown.
The Unknown truth value is generated whenever Null is compared with any data value, or with another Null.
That's another of those errors - sometimes comparing NULL with NULL delivers True (Paul's clever use of INTERSECT mentioned above explicitly depends on this), and TC stated that it should deliver True in some cases (cases listed in his SIGMOD 1979 paper, and the Dec 79 ToDS version). Wikipedia even mentions these particular comparisons further down the page.
If you read the Dec 79 ToDs paper (it's available somewhere online without requiring an ACM library subscription, but I'm not telling you where because that would be encouraging copyright violation :angry: - as well as encouraging TC's view that the research belongs to everyone ;-)) you will see that TC had introduced the two NULLs and 4VL system before then, but thought it reasonable to merge the NULLs and stick to 3VL for many purposes. If you read the 1990 book, in particlular the rules for replacing NULLs with actual values (you can download thbe book from the ACM library for $15, if I recall correctly) you'll see that the i and a NULLs aren't actually "inapplicable" and "applicable" since that would have precluded convertions between them.
Tom
July 14, 2011 at 3:54 am
David Portas (7/5/2011)
Tom,I'm well aware that the relational model and TTM require relations that consist of atomic attributes. What I see as a problem is that your article uses another criterion for 1NF:
Is this or isn't it in 1NF? That depends on what the database is expected to do with it
You define 1NF based on intended usage. This doesn't make any sense to me. Suppose I create a relation variable using my RDBMS and call it T. T happens to have a string attribute that encodes multiple telephone numbers. That's OK because a string is by definition an atomic value. According to your definition T is in 1NF as long as "the database doesn't know anything about that encoding". I don't have any problem with that so far because I don't ever intend to split out the phone numbers in the database. T is in 1NF.
Later on however, I change my mind. I want to split out the phone numbers using some code in my database. As you put it: "There will have to be code in queries to decode that complex string". Now you say that T is not in 1NF. But it's the same relation variable as before! The atomicity of the string and the structure of the relation has not changed. Just because I intend to use it differently T cannot suddenly stop being a relation. So your suggested criteria is an additional requirement for 1NF over and above the requirement for T to be a relation - a relation being a thing which is defined by structure alone and never by intended usage. According to you, the relation variable T can change from moment to moment from being in 1NF to not being in 1NF depending on what code exists in my database.
I think you are missing the point -it's not about precluding the database from holding certain objects, it's about ensuring that the objects it holds have so far as the database is concerned only types which are directly supported by the relational calculus or algebra language made available by the database system. Not an issue of abstruse theory, a really simple and straightforwards practical issue.
It's not the decision to have the object in the database that breaks 1NF - it's the decision to have the object in the database and require it to be manipulated by the database language (relational algebra or calculus with the interfaces of supported types) in a manner that requires understanding of its structure without including support for its type in the database language. And don't try to tell me that the type of a list of telephone numbers is string - because correct typing precludes nonsense like allowing 'XvMZQ54(w@^%' as list of telephone numbers.
If you modify the database system to recognise a type "phone number list" and add the operations that implement this type (append number to list, delete number from list, is number in list, is list empty, create empty list, merge two lists, and so on) then you are still in 1NF. If instead to decide to write those longhand in the database language using the primitives for operating on lists of characters (strings) provided by your database then you are not in 1NF because the object is not atomic in the database.
Tom
July 15, 2011 at 12:37 pm
Tom,
Firstly, great article. I actually stumbled onto it after reading your 2NF article via the link back here, and wanted to start from the beginning. Glad I did 😉
After reading, I was like "Wow... Good times..."
Then, after reading the comments, I was like "Aww... Bad Times..."
I will echo others in saying: Good job defending you stance. The scrutiny has been, at times, excessive (I thought), but I think you really made some great points both in your article and your replies. Honestly, coming from an App Dev background into this realm of DBAs, the article definitely helped put the concept of normal forms under a more focused lens for me (all theoretical nuance squabbling aside).
I also liked how you addressed the concept of context as it relates to "atomicity". So many data elements that we take for granted as being "simple" in nature rarely ever are (one nit that went unpicked: Full Name... atomic? yes/no/may-be-so), and are in fact themselves hierarchical arrays of data carried over from "real life" attempts at bringing a measure of order to The Chaos. I definitely think it was an important point to raise. I think using terms like "atomic" and "natural" in all this lingo tends to get us thinking to abstract and we forget that we're not actually talking about the indestructible, elemental entities of the cosmos. These data elements are, more often than not, just a bunch of non-sense concocted by a band of numb-skulls (and I do not exclude myself from being a subset of that domain) 😛
I think you touched on a great point, which really drives it home for me:
Tom.Thomson (6/30/2011)
...a structure that is not future-proof is not a good structure. When considering what the business rules are when determining what is 2NF, 3NF, EKNF, BCNG or 4NF one should consider what changes in business rules may happen next year, or the year after, or even later. For 1NF one should consider what someone might produce as a new requirement on the database in the future, as a new requirement may make a list-valued attribute into a violation of 1NF.
And as most users here probably have come to realize, being cogs in the corporate machine, this one principle definitely holds true in the business world: "The only constant is change". Ultimately, this is a good thing overall, I think. Humans are inherently less than perfect, and therefore our attempts to define and organize our world will be equally so. Our propensity for constant change gives us the ability to adapt to the unforeseen eventualities that will inevitably come due to our insufficient understanding of the world we seek to explain.
This means that anything we build must facilitate (or at least account for), and not restrict this element of change. A system that is too rigid will not hold up over time.
All this puts me in a mind to do what I love to do most in these situations (as some may have come to realize from my previous posts)... Over-simplify a vast and complex issue by distilling it down to a quaint movie reference! 😀
I call it "The Matrix Paradox" (I probably read it somewhere, so I'm sure I'm not the only one who calls it that... 😛 ) ...
Premise: Humans are imprecise, inaccurate, and to some degree unpredictable. For that reason, they are incapable of existing within a "perfect" system, because it would be too rigid, leaving no tolerance for error (which humans generate with great proficiency).
Dilemma:
- For the system to function properly it should be perfect
> (i.e. - in perfect adherence with "the rules". Note: Rules created by who? Exactly.)
- If the system is perfect it doesn't work
Summary:
As The Architect put it, "The problem ... is choice." Or to put it a different way (and I think more accurately): The problem is human error. Because in the world of 1 and 0, true and false, right and wrong... Choice is an illusion, and in the end you are either right or wrong. In adherence or in error.
Train of Thought... derailed! 😎
July 15, 2011 at 7:06 pm
pjcrout (7/15/2011)
Tom,Firstly, great article. I actually stumbled onto it after reading your 2NF article via the link back here, and wanted to start from the beginning. Glad I did 😉
After reading, I was like "Wow... Good times..."
Thanks for that - every comment like that encourages me to keep on writing.
Then, after reading the comments, I was like "Aww... Bad Times..."
I will echo others in saying: Good job defending you stance. The scrutiny has been, at times, excessive (I thought), but I think you really made some great points both in your article and your replies. Honestly, coming from an App Dev background into this realm of DBAs, the article definitely helped put the concept of normal forms under a more focused lens for me (all theoretical nuance squabbling aside).
I don't think anything was actually excessive - it's all fair comment and debate. I'm glad though to hear that you thought I did a good job in defending my position. I came from an mathematician and app developer background into database research, and after a lot of other things (largely not DB related) into doing the DBA thing as well as much else (a couple of "you name it, I do it" jobs), which I think is very like coming from a developer background into the DBA realm so we probably have a lot of ideas in common. Anyway, I am trying to write for practical DBAs who got into that role accidentally (as I did) and not for the theorists. I do find it an advantage to know some of the theoretical nuances too, though - it save a lot of trouble when designing a schema.
I also liked how you addressed the concept of context as it relates to "atomicity". So many data elements that we take for granted as being "simple" in nature rarely ever are (one nit that went unpicked: Full Name... atomic? yes/no/may-be-so), and are in fact themselves hierarchical arrays of data carried over from "real life" attempts at bringing a measure of order to The Chaos. I definitely think it was an important point to raise. I think using terms like "atomic" and "natural" in all this lingo tends to get us thinking to abstract and we forget that we're not actually talking about the indestructible, elemental entities of the cosmos.
Ted Codd, who introduced the atomicity idea into relational database theory (which he invented, of course) didn't use the term "atomic" in his early writingss. Instead he said "simple". I like that term (indeed I prefer it to atomic, because atomic doesn't properly reflect his insistence - which I absolutely believe was correct - on sticking to a first order relational calculus) but I don't use it because sticking to "atomic" usually gets me into fewer silly arguments (I'm very lazy, so avoiding pointless arguments is one of my prime objectives).
These data elements are, more often than not, just a bunch of non-sense concocted by a band of numb-skulls (and I do not exclude myself from being a subset of that domain) 😛
Join the club - I'm definitely a member of that domain:-D
Premise: Humans are imprecise, inaccurate, and to some degree unpredictable. For that reason, they are incapable of existing within a "perfect" system, because it would be too rigid, leaving no tolerance for error (which humans generate with great proficiency).
Dilemma:
- For the system to function properly it should be perfect
> (i.e. - in perfect adherence with "the rules". Note: Rules created by who? Exactly.)
- If the system is perfect it doesn't work
Summary:
As The Architect put it, "The problem ... is choice." Or to put it a different way (and I think more accurately): The problem is human error. Because in the world of 1 and 0, true and false, right and wrong... Choice is an illusion, and in the end you are either right or wrong. In adherence or in error.
That only works if logic is restricted to black and white (true and false, no other possibilities). Constructive mathematicians work in a world where there are infinitely many shades of grey. Even non-constructivists are happy to use all sorts of MVLs instead of classical two valued logic (the difference between them and constructivists is that the constructivists refuse to use the 2VL black-and-white logic, because they (or we - I'm not admitting to be on either side of that debate) believe it just doesn't work). Relational Database theoreticians have a similar division - anti-MVL and pro-MVL; it's kind of sad that in the Relational Database world they don't respect each other's positions, unlike the situation in mathematics.
Tom
July 16, 2011 at 2:00 am
Tom.Thomson (7/15/2011)
That only works if logic is restricted to black and white (true and false, no other possibilities). Constructive mathematicians work in a world where there are infinitely many shades of grey. Even non-constructivists are happy to use all sorts of MVLs instead of classical two valued logic (the difference between them and constructivists is that the constructivists refuse to use the 2VL black-and-white logic, because they (or we - I'm not admitting to be on either side of that debate) believe it just doesn't work). Relational Database theoreticians have a similar division - anti-MVL and pro-MVL; it's kind of sad that in the Relational Database world they don't respect each other's positions, unlike the situation in mathematics.
Fair enough 😉 I can appreciate the "shades of grey" stance. For practical purposes, a tolerance for "shades of grey" is generally the more flexible approach, and more in line with "common sense" and real life practices, etc... as it seems we can rarely ever make a clear distinction about real world affairs that we might then attempt to model. I guess the question there though would be: Is it because there is no distinction, or is it because we just can't see it?
For the sake of debate (which I love) let me play devil's advocate and posit these notions and see how far down the rabbit hole we can go:
- Anything outside and/or BOTTOM/TOP of the domain of "true or false" is either inapplicable to true or false or unknown/unknowable as true or false and therefore cannot be evaluated as such. In this example, the other values outside true and false, or the grey, is akin to the mind filling in the blanks of an incomplete picture, and not actually an implication that a BOTTOM or TOP actually exists or that there is actually anything outside that domain. The grey here is only an approximation on the unknown, unknowable, or the seemingly inapplicable. Once known, knowable, or applicable though... it becomes either true or false. That is a limitation on "the knower", not true/false.
- As for any known "value" that is seemingly "grey", seemingly not quite true or false - as opposed to chalking this up to a multi-valued system of logic, one could instead attribute this to a layer of complexity (contrived or otherwise) on top of a fundamental true/false framework, but still essentially a subset of that framework. That's not to say there's anything wrong with going for complexity. The world can be a pretty complex place, so having a complex system of logic can help with understanding interactions at higher orders of magnitude, without the need to evaluate each element individually (i.e. - When counting on your fingers, the finger is only ever up or down, either there is/true/up or there is not/false/down, but since we only have a limited number of fingers to work with... decimal system).
- Any known value perceived to be somewhere either in between true and false or other than, and not one or the other, is actually not one value (simple/atomic), but only represented as one value as a result of higher levels of complexity on top of the fundamental. At it's core though, it is a collection of true and false values, which if decomposed would yield only the two values, regardless of how infinitely or infinitesimally true or false they might be.
I'm sure I have violated one or more mathematical principles somewhere along the way, so feel free to school me 😉 I'm more of a philosopher than a mathematician any way 😎
This is what happens when I can't sleep... :hehe:
July 28, 2011 at 9:24 am
I finally read this article (and amazingly, all 70 posts...something I NEVER do). Very good article and like several others have pointed out your responses in these discussions are exemplary. Tom you are truly "a gentleman and a scholar". I'm going to read the 2NF and 3NF articles now. Thank you for writing.
Viewing 10 posts - 61 through 69 (of 69 total)
You must be logged in to reply to this topic. Login to reply