"Real" copy of Codd's 12 (13) Rules for RDBMS

  • David Portas (8/1/2010)


    I don't have a problem with "outer joins" by the way. I think it's a good idea for a relational language to have a syntax shortcut for a join combined with a union. The outer join operator in SQL is unfortunately less powerful than it should be because it forces nulls upon the user for the union part of the query where it is normally much more convenient to return other values instead. Sadly, there is no generally reliable and concise way to do that in SQL so we are stuck with nulls. Of course those nulls always get removed for the end user because no one wants nulls in their end results.

    I'm appalled. You want to stick some values in there although the database does not have any values, and you think that doing so demonstrates mpre power than indicatig clearly to the user that the values are absent from the database.

    If you believe that nulls generated by outer join always get removed before any user sees them you have lead a very sheltered life!

    Tom

  • David Portas (8/1/2010)


    Quite simply I would have a relation for each unique type of screek.

    Are you going to define all 2 to the power 100 (minus 1) types from the start or will you add a new type each time one turns up?

    When you do unions of relations for different types of screek how will you cope with the fact that the relations you are getting a union of don't have the same header (or signature if you prefer that term) as each-other - invent values you don't have, or use something very like a NULL?

    Tom

  • David Portas (8/1/2010)


    David Portas (8/1/2010)


    Quite simply I would have a relation for each unique type of screek.

    Actually it's simpler than that. Applying Date and McGoveran's Orthogonal Design principle ensures that data isn't needlessly split between multiple relations. This simplifies things because logic that applies to any attribute doesn't need to be duplicated. So for example, instead of:

    {screekID,A1,A2}

    {screekID,A1,A2,A3}

    {screekID,A1,A2,A4,A5}

    For the same three types of screek I would do:

    {screekID,A1,A2}

    {screekID,A3}

    {screekID,A4,A5}

    screekID being the key in each case.

    I'm glad you changed your mind - you are no longer heading for 2**100 relations. And you no longer have to solve the problem of Unions over sets of relations that don't share a header (but you now have to decide what to do about values that are not present in an outer join - which is the same problem if you define your outer join in terms of such unions).

    However, if you start with those three relations what happens when a screek for which the value of A4 is available but not the value of A5 turns up? You have to invent two new relations, move data from one of your existing relations into them, and drop that existing relation. And this can go on happening as new screeks arrive.

    And of course if you gain some additional information about a screek - suppose you discover the value of A6 for one of your existing screeks: another new table, and you didn't even have to have a new screek to acquire a need for it.

    The process will go on, perhaps until you have 100 tables.

    Tom

  • David Portas (8/1/2010)


    That may be a reasonable definition in other contexts but it's no help to anyone trying to understand SQL nulls.

    Hmm. I though we were discussing the general idea of having NULLs and the fact that in either of Codd's versions of NULL (single NULL or two NULLs) it comes with a multivalued logic. That is nothing to do with SQL (as SQL doesn't implement either of Codd's versions on NULL).

    For example a null in an outer join means that a corresponding value definitely does NOT exist for that row.

    It can be argued that even in SQL a NULL in a relation defined by an outer join means nothing of the sort, it means that a datum is not present in the database - if a row is not present in the database then the data in that row are not present in the database, so SQL has no option but to return NULL to indicate that these data are absent.

    A null returned by the SUM of an empty set means zero (the null is returned instead of zero, zero being the right answer).

    Here you are right - SQL has got this one wrong. The sum of the empty set is zero. Whoever chose a different answer for SQL was mathematically illiterate.

    Tom

  • Matt Miller (#4) (8/1/2010)


    We seem to be circling around semantics again. In my book I use unknown to mean "I know nothing of this, to include whether it should have a value", as opposed to "unknown value" (i.e. it SHOULD have a value, but for some reason we don't know doesn't).

    I don't think "semantics" should be treated as a derogatory term as appears to be happening here (but I got one of my degrees for research on the semantics of infinitary calculi, so maybe I'm biased).

    I like to distinguish between "unknown" and "absent" because I see them as being different concepts. A database doesn't know anything - it contains representations of our understanding of some things, but it doesn't itself understand them. If data is absent, it isn't unknown in the database, because the concept of something being known in the database is meaningless (at least until we get databases running on top of self-aware AIs, if that ever happens) - it's just absent. Of course there's another meaning of UNKNOWN - it's a truth value in a 3VL, and that is also very different from NULL; but I don't think that one creates any confusion.

    Tom

  • David Portas (8/1/2010)


    Jeff Moden (8/1/2010)


    Maybe I have your intention wrong but look again. Tom said that a NULL could not be used to explain WHY data was missing (or unknown my book). You went off on the idea that NULL meant neither. Who's disagreeing with who? Sounds like you're talking about two different things. Then you went into the condescening-crap-mode by saying "Obviously that is nonsense" and implied that only Tom and a few other are "sensible" when it appears that even Codd disagrees with you.

    Now... Without being condescending, why do you say that NULL does NOT mean UNKNOWN in SQL when you know neither the data nor the reason for the data to be missing?

    I think you are being unfair and selective in commenting on my reply. Tom used the word "stupid" to describe people who use NULL to represent the reason why data is missing in SQL. Even though I agree with Tom that it's a stupid thing to do, I don't think I've ever called anyone stupid in this or any other forum. I accept it was probably just a rhetorical flourish on Tom's part and wasn't aimed at anyone in particular... but then you single out me for being condescending?!

    The point both Tom and I agree on is that in a single null system like SQL, the null is often used to indicate the absence of a value but nothing can reliably be inferred about why it is absent. One reason why a value is absent may be that it is unknown but there can be lots of other reasons too. For instance nulls are returned in outer joins or in the sum of an empty set, which certainly are not unknown values. The null is just used as a placeholder for a value which is known not to exist.

    So the way SQL uses it, null is not the same as unknown, it's used to mean a whole bunch of other things that have nothing to do with being unknown. If you assume that null means unknown then you will get the wrong results in SQL. Most obviously, X=X is always TRUE if X is unkown but is not true if X is null. QED.

    First of all... very nicely done on the explanation above. No sarcasm, no inferring that everyone but you and a small handful of people are the only "sensible" ones on the subject, etc, etc.

    And, yes... I'm absolutely being selective because you were the only one being condescending. I think you have a huge amount to offer in a huge number of areas but your arguments are frequently lost in sarcasm, irony, and other techniques (I take some of them as downright nasty at times) which turn people away from trying to have an intelligent conversation with you. You've done it to me many times and you even blamed it on British humor once, IIRC. Look at the difference between what you and Tom have said. Tom DIDN’T say that all managers but he and a few “sensible” people were stupid, ignorant, arrogant, or any of the other choice words he used. You, on the other hand, DID infer that only you and a small handful of people were sensible. That’s the difference and, realized on your part or not, it’s a huge difference in the eyes of the beholder.

    Shifting gears back to the problem of NULLs and what they actually are and the semantics surrounding them. I first of all, whole heartedly agree with Matt about semantics on the subject. To take it a bit further, I think the semantics have become more important and troublesome than the subject itself.

    Codd wrote that “Null Values …{snip}… are for representing missing information and inapplicable information” so, right off the bat, the father of relational databases has declared that there are two possible values for NULL…1) missing information and 2) inapplicable information. That means that despite best mathematical practices, when NULL is equated to the letter “X”, “X” represents two different things and they could either be the same or not. Than means that X neq. X when X = NULL because you don’t know which value of NULL is being used for each “X”. The value of NULL is UNKNOWN to you, me, and everyone else because it has more than one purpose and we just got done agreeing that you can’t determine the purpose of a NULL. Semantically speaking and IMHO, NULL is synonymous with the word UNKNOWN in SQL. But, to continue the semantics…

    Semantically speaking, NULLs that appear in outer joins OUTER JOINS could in fact be defined as UNKNOWN data simply because it’s missing and there wasn’t anything to join on. Why it’s missing is UNKNOWN. It all depends on which way you want to interpret Codd’s third rule. I think people spend way too much time trying to do that.

    In the UK, people drive down the left side of the road. In the US, people drive down the right side of the road. Why? Heh… it was in someone’s mind to do so. In Mathematics, NULLs are sometimes defined as the Empty Set and some will argue for that or against that (sometimes even to themselves). In SQL Server, NULLs are DEFINED as UNKNOWN and here’s the BOL entry which provides the definition…

    “A value of NULL indicates that the value is unknown. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown.”

    So, whether or not X = X or { } = { } or even if every X = { } in Mathematics, in someone’s mind, or anywhere else, it just doesn’t matter because NULLs have been DEFINED to mean UNKNOWN and NULLs have been DEFINED to never, ever be equal (sans any server setting changes which, BTW, have actually been deprecated) in SQL Server.

    I’ll go one step further. Semantically speaking and flying in the face of some who would attempt to define NULL, since NULL has been defined to be a missing value (as one possible definition), one could construe that NULL truly does NOT mean NOTHING and that NULL is actually not a value but a representative condition. Since NULL has no value whatsoever (not even empty sets or zero values according to Codd’s own definition in Rule 3), it cannot be equated to itself because it has no value to equate to.

    BWAAA-HAAA!!! Like Bill Cosby said during his famous “Fat Albert” story, I told you that story so I could tell you this one. 😛 That story is that it’s not a matter if I (or anyone else, for that matter) agree with it or not, it’s defined that way in BOL and folks need to lighten up on those that say “NULL = UNKNOWN” in SQL Server because... they’re correct by definition. 😉

    Can we move on to something else, folks? Thanks. 😉

    P.S. Thanks to a good friend of mine, I was able to get a copy of the Computer World articles that carried Codd’s original 12/13 rules. Thank you all for the help in trying to help me locate a copy.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Tom.Thomson (8/1/2010)


    If you believe that nulls generated by outer join always get removed before any user sees them you have lead a very sheltered life!

    Heh... and to think I just got done telling David that he was the one being condescending. I know he started it but now I'm going to have to eat a big ol' serving of crow in public. :sick:

    You guys be nice to each other. 😉 I'm out'a here. 😛 Lordy, I picked the wrong year to quit smoking again. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Tom.Thomson (8/1/2010)


    Matt Miller (#4) (8/1/2010)


    We seem to be circling around semantics again. In my book I use unknown to mean "I know nothing of this, to include whether it should have a value", as opposed to "unknown value" (i.e. it SHOULD have a value, but for some reason we don't know doesn't).

    I don't think "semantics" should be treated as a derogatory term as appears to be happening here (but I got one of my degrees for research on the semantics of infinitary calculi, so maybe I'm biased).

    I like to distinguish between "unknown" and "absent" because I see them as being different concepts. A database doesn't know anything - it contains representations of our understanding of some things, but it doesn't itself understand them. If data is absent, it isn't unknown in the database, because the concept of something being known in the database is meaningless (at least until we get databases running on top of self-aware AIs, if that ever happens) - it's just absent. Of course there's another meaning of UNKNOWN - it's a truth value in a 3VL, and that is also very different from NULL; but I don't think that one creates any confusion.

    I wasn't passign semantic as derogatory at all. It's more of a "we seem to be assigning different semantics to terms, thus we disagree each based on our own local definitions".

    I do understand what you are getting at with absent, etc...As to unknown, it's not that the DBMS knows anything, it's that the user of the system doesn't know that particular piece of information. As you mentioned, no system "knows" anything: it simply tracks what we know.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • [Sorry for the delayed follow up]

    Tom.Thomson (8/1/2010)


    I'm appalled. You want to stick some values in there although the database does not have any values, and you think that doing so demonstrates mpre power than indicatig clearly to the user that the values are absent from the database.

    No I do not want to "stick some values in there". I expect the user to define what results she wants from the data and/or come to me with the requirement and I will write the query for her. If there are no nulls in the database then there is no obvious reason to put them in queries against that database either.

    If you believe that nulls generated by outer join always get removed before any user sees them you have lead a very sheltered life!

    What I meant was that nulls are not part of any business requirement so they are not required as the end product of any database solution. Nulls are a feature of the technical implementation and not part of the universe of discourse or the concern of end users - most of whom don't even know what a null is.

    SQL doesn't implement either of Codd's versions on NULL

    Absolutely. Do you agree with me that SQL's version of null is seriously deficient, leads to incorrect results and that SQL nulls should be avoided or minimised in SQL database design? For the benefit of everyone who might read your posts it would help if you could be clear about whether you mean your arguments in favour of nulls to apply to SQL nulls as well. I suspect you would not advocate SQL nulls but others reading this may have a different impression.

    Part of the problem however is that there is no other well defined model of nulls to use as a point of reference. To my knowledge Codd never fully defined his. So I don't have that model at hand to be able to argue on some of these points other than by reference to what Codd did write. Codd's RM V2 book has huge gaps. Codd says that to avoid incorrect results a MVL database needs a system of tautology detection. However, he does not explain what are the tautologies that must be supported for a 1-null, 2-null or N-null version of relational algebra to work. This seems like an enormous issue to me because it goes to the heart of whether MVL causes incorrect results or not.

    My very strong impression is that MVL must always lead to incorrect results in the end (I mean results that don't accurately match the real world), but since I have no system I can test and no complete definition I can read there is only guesswork and the bad example of SQL to go by. On the other hand I do have databases and software without nulls. I can read their specs (D and its implementations for example) and I know that they give accurate results by design and in practice. On what are you basing your conclusions about non-SQL systems with nulls? Is there some definition or software I can look at? I would definitely be interested to see it and find out for myself whether it justifies your support for it.

  • Matt Miller (#4) (8/1/2010)


    A set with no members is what is called a NULL set.

    That's true. The empty set is sometimes called the null set. However, the null set (=empty set) has absolutely nothing to do with nulls in SQL or nulls in Codd's database model. The empty set is a value and just like any other value it is equal to itself. Nulls are markers in a database to indicate the absence of values. It is very unfortunate that the word null has gained these two totally different meanings.

    the addition operation is defined as an operation between two values.

    Not exactly. The sum of a single value is equal to that value. The sum of the empty set of values is 0. The product of the empty set of values is 1. These results follow from the basic principles. Think about the set of numbers {2,4}. The sum is 6. Take away 2 you have 4. Take away 4 you have 0. Proper support for the empty set is fundamentally important in a database but SQL fails this simple test.

  • David Portas (8/6/2010)


    What I meant was that nulls are not part of any business requirement so they are not required as the end product of any database solution. Nulls are a feature of the technical implementation and not part of the universe of discourse or the concern of end users - most of whom don't even know what a null is.

    I would agree with this statement if we go to the extreme of saying users don't know nothing - which may be a true statement - and, everything is a technical implementation of the business requirements.

    If we are not willing to go to that extreme let me test this case that relates to car rental system, reservations application.

    - At reservation time a row is generated describing the transaction at reservation time. Among other columns the row includes date_reservation, date_pickup_expected, date_return_expected, date_pickup_actual and date_return_actual

    - At the time the car is pick up date_pickup_actual is updated with current date.

    - At the time the car is returned date_return_actual is updated with current date.

    Please let me note that at reservation time the values of date_pickup_actual and date_return_actual are "not known", that's what business requirement says.

    So... at reservation time... which value do you use to populate date_pickup_actual and date_return_actual considering business specs are stating that those values are "not known" at that time? ... it sounds like Null value would do the trick and that's exactly what better translates the business requirements.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • PaulB-TheOneAndOnly (8/8/2010)


    So... at reservation time... which value do you use to populate date_pickup_actual and date_return_actual considering business specs are stating that those values are "not known" at that time? ... it sounds like Null value would do the trick and that's exactly what better translates the business requirements.

    If the requirement is to represent cars which have not been picked up then it makes perfect sense to put them in a table without a pickup date (or a return date). The fact that a car has been picked up could perfectly well go in a different table. No nulls required. The database designer may decide to add some nulls to the model for various reasons but those reasons are not part of any business requirement - they are simply a design choice made by the designer. Notice that the word "null" did not appear in your statement of the business requirement. Null is not part of any properly stated business requirement.

  • David Portas (8/6/2010)


    Matt Miller (#4) (8/1/2010)


    A set with no members is what is called a NULL set.

    That's true. The empty set is sometimes called the null set. However, the null set (=empty set) has absolutely nothing to do with nulls in SQL or nulls in Codd's database model. The empty set is a value and just like any other value it is equal to itself. Nulls are markers in a database to indicate the absence of values. It is very unfortunate that the word null has gained these two totally different meanings.

    the addition operation is defined as an operation between two values.

    Not exactly. The sum of a single value is equal to that value.

    The sum of the empty set of values is 0.

    The product of the empty set of values is 1.

    Again - there's no mathematical basis for these "basic principles", which is why they are arbitrary conventions. If you know of actual mathematical demonstrations of these rather than the "gentleman/woman's agreement" on the topic, I'd love to see them. Are they useful when trying to apply these to real-life scenarios? perhaps, but it doesn't make them any more valid or any less arbitrary.

    As to addition and multiplication being a unary operation, I'd be REALLY interested in a mathematical reference actually pointing that out (my 2nd grade math teacher would like a copy as well).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • As to addition and multiplication being a unary operation, I'd be REALLY interested in a mathematical reference actually pointing that out (my 2nd grade math teacher would like a copy as well).

    I was talking specifically about sum and product which are operations defined on sets of numbers (or in the relational model, sets of tuples). You can find that in any book on basic set theory. A set can be zero, one or many items - it makes no difference to the sum operation. No written demonstration required. Take a bucket containing 4 balls. Take 2 balls out, how many remain? Now take 2 more out, how many remain? If you are claiming that the right answer of 2-2 can be something other than zero or that an empty bucket plus 2 balls can be something other than 2 then I don't know what else to say.

  • David Portas (8/8/2010)


    As to addition and multiplication being a unary operation, I'd be REALLY interested in a mathematical reference actually pointing that out (my 2nd grade math teacher would like a copy as well).

    I was talking specifically about sum and product which are operations defined on sets of numbers (or in the relational model, sets of tuples). You can find that in any book on basic set theory. A set can be zero, one or many items - it makes no difference to the sum operation. No written demonstration required. Take a bucket containing 4 balls. Take 2 balls out, how many remain? Now take 2 more out, how many remain? If you are claiming that the right answer of 2-2 can be something other than zero or that an empty bucket plus 2 balls can be something other than 2 then I don't know what else to say.

    The count of members in a set is not the same as memberwise addition. Taking two 2 balls from a set with two balls in it results in a set with a COUNT of balls =0.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 46 through 60 (of 106 total)

You must be logged in to reply to this topic. Login to reply