Normalization

  • Steven993 (4/26/2011)


    tabraham 36699 (4/26/2011)

    Ideally, Chris Date's vision of an OO-Relational DB will emerge, with, among other things, a formally defined range for each attribute and an explicit value representing UNKNOWN, with operators to handle it. Until then, we're stuck with NULLs if we want to deal with the real world.

    In "Chris Date's vision" (in relational theory in fact) there is no "explicit value representing UNKNOWN". There is no UNKNOWN.

    You are quite right about Chris Date's vision. However, Chris Date's vision is the vision of one school, it isn't the whoe of relational theory. So your parenthetical "in fact" is actually fiction, not fact.

    Relational theory isn't an "ideal" thing that came from nowhere.

    Every details is there for mathematical and practical reasons; for the logical user point of view.

    Absolutely true. It came out of mathematical logic, set theory, and domain theory. Domain theory has always had NULL (except that in domain theory it is called "bottom"). Attributes (or columns, or whatever you want to call them - I was surprised to see that you use "header" where I use "signature", and I'm not sure what terminology your school of thought uses) are supposed to be defined as being in domains, not in some other magical mathematical things. Try reading the work done by Strachey and Scott on domain-theoretic semantics, and then come back and tell me (if you still believe it) that domains don't have NULL.

    Look at the Information Principle for example.

    I suggest you look at Codd's paper in which he introduced The Information Principle as the first of his 12 rules for a database to be relational. That's the same paper in which he says a database can't be relational if it doesn't have a direct and simple way of expressing NULL. He clearly saw no contradiction between the information principle and having NULL, and neither does almost anyone else who has thought it through carefully.

    I've seen that David Portas has thought carefully about it (so I respect his views although I disagree with them), and so has Fabian Pascal (although that is not always made clear in some of his more confrontation writings); but I am pretty sure Chris Date never thought it through clearly (else we would have seen a retraction of some of the rubbish in his report of the debate between himself and Codd) and on showing so far I'm pretty sure you haven't either.

    Tom

  • David Portas (4/26/2011)


    Now you are being silly. No one said nulls are "evil" or objected to them for "religious" or dogmatic reasons. The objections are about the practical disadvantages of nulls, the fact that they cause incorrect or contradictory results and break the principles of normalization. Those are reasons supported by Codd, Date and many other experts as well as a wealth of real practical evidence.

    Now it's you who is being silly David, repeating your claim (which I have debunked before) that Codd stated that NULLs cause false or contradictory results and break the princilples of normalisation. What he said about NULLs and normalisation was that the work done before NULLs were firmly proposed needed to be looked at again in the light of NULLs; and (before he made that statement in fact, as he was at the time talking about history) it was indeed looked at again (by Codd and by others) and the conclusion was that as long as no column of a key could be NULL all the normal form definitions and published normalisation algorithms continued to apply unchanged.

    What he said about NULLs and incorrect results was that if you failed to use an MVL in a system with NULLs you would inevitably get incorrect results; and if you used the SQL version of NULL you were guaranteed incorrect results (he was a mathematician, and believed - as will any mathematician, certainly including me - that the aggregate functions in SQL were just a shorthand for reductions, and clearly SQL has got that completely wrong with respect to both the empty set and the case where a NULL is encountered when forming the aggregate).

    Even Microsoft agrees (in case you value their opinion).

    What that MS page says is that queries should be so designed as to minimise the impact or NULLs and that the use of NULL should be minimised. It doesn't say "avoid NULL altogether" so clearly MS does not support your position (if it did, I imagine T-SQL would not include NULL). Using NULL only when it's needed is something I agree with completely (whether in SQL or in anything else).

    Let's keep the discussion to a sensible, practical evaluation of the pros and cons of nulls, rather than throw empty accusations around.

    Yes, I agree. It's certainly a pity that you have chosen to accuse Ted Codd of holding views which he plainly never held.

    Tom

  • Well we have a problem here.

    I think you didn't understand what 1NF is.

    I'm not quoting me. I'm quoting Date. If you really think he doesn't know what 1NF is, that's another issue entirely...

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/26/2011)


    I'm not quoting me. I'm quoting Date. If you really think he doesn't know what 1NF is, that's another issue entirely...

    ........ I can't believe it...

    YOU are the one telling me, my schema is violating 1NF. When it absolutely don't.

    You'r quoting things you don't understand...

  • Steven993 (4/26/2011)


    Well we have a problem here.

    I think you didn't understand what 1NF is.

    Let's take data example to make it clear !!

    What you are citing are the properties a TABLE must follow, how it must be read to be considered as a relation.

    Address(add_id, add_Name, add_zipCode, add_City, cust_id#)

    1 Personal 75000 Paris 3

    2 Work 75000 Paris 3

    AddressLine(add_id#, line_id, line_Content)

    1 1 '16 rue de la Paix'

    2 1 '4 rue de la Plaine'

    2 2 '1st Floor, 2nd door'

    That's 2 tables representing two relations.

    What the definition of 1NF is saying is that there is no importance in the row order you see. Because a relation is a set of tuples, and a set dosen't have order.

    I am obliged to order the row, to present them as a table. But in the relation behind that table there is no order !!!

    AddressLine(add_id#, line_id, line_Content)

    1 1 '16 rue de la Paix'

    2 2 '1st Floor, 2nd door'

    2 1 '4 rue de la Plaine'

    AddressLine(add_id#, line_id, line_Content)

    2 2 '1st Floor, 2nd door'

    1 1 '16 rue de la Paix'

    2 1 '4 rue de la Plaine'

    Thoses tables above are differents tables !!!! Because obviously the row order is different.

    But they represent the SAME relation. Beacause order dosen't matter.

    This is what 1NF say about order. It has nothing to tell about the values.............

    Same goes for attributes order.

    .....

    You're seriously conflating physical storage with logical data structures here.

    Saying "this row is the 1st piece of a datum, this other row is the 2nd piece of the same datum" (both define the address), is very, very explicitly defining the rows as having an inherent order for them to make sense.

    When you want to turn that table back into "real world" objects, you HAVE to work with an explicit ORDER in the rows, when you have decomposed the real object (a location on a street in a city) into ORDERED data.

    The data in any given row in that table is COMPLETELY USELESS unless you include that it is ordered when compared with other rows in the same table. That's how you've decomposed the data.

    How is that NOT "the rows have an order to them"? You've built in an ordering mechanism that explicitly relates rows to other rows based on their SEQUENCE. That's the very definition of "rows have an order that matters".

    If the order of the rows wasn't an explicit property of the data, that whole column wouldn't be needed, but the data in any given row would still be non-atomic. So if it doesn't matter if "row 1" comes before "row 2" of any given address ("the rows aren't ordered"), then why have that column. If it does matter, then the data is not in 1NF. So, either you have incorrectly decomposed the object you are trying to represent in data, or you have violated 1NF. Which is it?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • It's Bush's fault that we are having this disagreement!

    I have to take the hit as well...

    Es de nada...

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • You'r so wrong about 1NF, I can't add anything.

  • David Portas (4/26/2011)


    GSquared (4/26/2011)


    ...

    If the only way to avoid NULLs is to violate the very religion that is being cited as mandating that they are evil, then that's all I'm pointing out.

    Now you are being silly. No one said nulls are "evil" or objected to them for "religious" or dogmatic reasons. The objections are about the practical disadvantages of nulls, the fact that they cause incorrect or contradictory results and break the principles of normalization. Those are reasons supported by Codd, Date and many other experts as well as a wealth of real practical evidence. Even Microsoft agrees (in case you value their opinion).

    Let's keep the discussion to a sensible, practical evaluation of the pros and cons of nulls, rather than throw empty accusations around.

    A) It's not empty. Silly, yes. Facetiously worded? Yes. Empty? No. I'm sorry for stepping outside of the bounds of the completely literal and into language intended to point out how fatuous the argument really is.

    B) Codd later agreed that nulls have value in relational data. So don't cite him as an authority on that point.

    C) As I already pointed out, hiding NULLs by leaving out rows of data doesn't actually solve the issue. It still doesn't clarify whether the data is actually missing, or is known to not exist.

    D) No, this argument left practicality behind the moment it started citing opinions by authorities. That's ideological, not practical, which is why I started making fun of it being a "religious" issue at that point. Ideology has NO place in a technical discussion. Saying "so-and-so authority figure says you're wrong" is NOT proof. It's ideology. No mathematician will accept "Triangles with the same size sides are equal because Pythagorus said so" as proof. You shouldn't either. I certainly don't. Prove a point using mathematical rigor, based on provable or mutually accepted initial postulates (assumptions), or don't prove it at all. Atomic bombs blow up whether Oppenheimer gives them permission or not. Your databases work or don't whether Joe the Janitor says NULLs are good or bad. It's really as simple as that.

    My objection to this whole debate is the introduction of "authority" into it. My whole point is that citing authority is silly. Yes, I cited authority, then used it to prove something that I myself say is false (violation of 1NF). Re-read my posts, bypass emotional responses to what I wrote, you'll see that I refute my own claims in the same posts that I make them. I do this for a reason. The reason is point (D) on the above list.

    Using natural law, mathematical proof, solid laws of logic and illogic, you can prove useful things. Citing authority only proves that you can cite authorities; it's a logical flaw and is recognized as such in the sciences of debate and logic.

    Re-read what I wrote. You're objecting to the wrong things in it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/26/2011)


    My objection to this whole debate is the introduction of "authority" into it. My whole point is that citing authority is silly. Yes, I cited authority, then used it to prove something that I myself say is false (violation of 1NF). Re-read my posts, bypass emotional responses to what I wrote, you'll see that I refute my own claims in the same posts that I make them. I do this for a reason. The reason is point (D) on the above list.

    Using natural law, mathematical proof, solid laws of logic and illogic, you can prove useful things. Citing authority only proves that you can cite authorities; it's a logical flaw and is recognized as such in the sciences of debate and logic.

    Re-read what I wrote. You're objecting to the wrong things in it.

    So let's be clear about this please. You are agreeing with my view and Steven's and Chris Date's that Steven's proposed design of a table does NOT violate 1NF because by definition a relation is always in 1NF and there is certainly no order to tuples in a relation. No appeal to authority is required for you to agree with that. Is that your true position? Your responses to Steven seemed to indicate otherwise but I'm glad if we now agree.

  • George would not be happy if he knew that we were disagreeing about this.

    I'm going to give him a heads up and get his input.

    Es da Nada…

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Steven993 (4/26/2011)


    Obviously you don't know what a predicate is, or you'r just ignoring predicate variables. Nice 😉

    I don't understand what you mean. To me a predicate variable is a variable which ranges over predicates, and I didn't see one anywhere in your message. If you mean a formal parameter of a function which generates propositions why not say that? If you mean something else... well, please tell me what you meant. And, surprising as it may seem in a mathematician who used to specialise in mathematical logic before drifting into computer science and then computing and IT, I do know what a predicate is.

    You're calling Occam's razor on the number of tables ? What a joke... Why are you normalizing in EKNF ? Just make an Excel sheet for your datas.

    No, I'm calling Occam's razor on introducing un-needed metadata into the database (which might be seen as a sort of denormalisation, or as splitting something that is already atomic into sub-atomic components). I start from EKNF as the rational place to start from - in the rare cases EKNF isn't also 5NF it's easy to see the trade-offs between further normalisation and staying at EKNF.

    And Excel is an excelled DBMS provided for any DB which requires only one relation and that relation is in DKNF(btw, is that what you would call 6NF, or do you use Date's newer definition of 6NF) and has few enough rows.

    Tom.Thomson (4/26/2011)


    Oh, so that's something on which you disagree completely with Chris Date, is it? After all, it was he who claimed you can always avoid NULL by using default values.

    Source ?

    I think I posted this reference before; in it you will see Codd and Date debating (amongst other things) Date's idea that instead of introducing NULLs you can use default values, and Date in fact gives two references to his work on using default values instead of NULL: these are

    C. J. Date: Section 5.5 ("Null Values") of An Introduction to Database Management: Volume II. Reading, Mass.: Addison-Wesley (1982)

    and

    C. J. Date, "The Default Values Approach to Missing Information," in C. J. Date and Hugh Darwen, Relational Database Writings 1989-1991. Reading, Mass.: Addison-Wesley (1992)

    I can't recall whether it was one of these that I read many years ago, leading me to believe that Date hadn't a clue about the problems with using defaults for unknown values, or whether it was something else he had written.

    If you do read the reference above, don't be confused by Dace's suggestion that all 3VLs are like a Lukasiewicz logic (they aint). The logic which should be used in a single-null SQL system is a Kleene logic, not a Lukasiewicz logic. For example the propositional operators don't include two distinct conjunctions (a weak one and a strong one) and two distinct disjunctions (a weak one and a strong one) as in Lukasiewicz logic, just one conjunction and one disjunction.

    Also remember that Codd's terminology in RM2 is a bit misleading - although the A-NULL is said to imply applicability, it actually includes the case where it is unknown whether the attribute is inapplicable or not (which I think is unclear in the RM2 paper, and i also think that terminology was a grave error which caused some of Date's confusion about RM2).

    Tom

  • George!!!

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Tom.Thomson (4/26/2011)


    David Portas (4/26/2011)


    Now you are being silly. No one said nulls are "evil" or objected to them for "religious" or dogmatic reasons. The objections are about the practical disadvantages of nulls, the fact that they cause incorrect or contradictory results and break the principles of normalization. Those are reasons supported by Codd, Date and many other experts as well as a wealth of real practical evidence.

    Now it's you who is being silly David, repeating your claim (which I have debunked before) that Codd stated that NULLs cause false or contradictory results and break the princilples of normalisation.

    Codd states that nulls cause incorrect results in several places and he gives examples to prove it. Here's what he says in "The Relational Model for Database Management Version 2" page 183:

    "A repetition of the warning about multi-valued logics that I included in

    [Codd 1986a, 1987a] may be appropriate here. Such logics can yield the

    truth-value MAYBE for an expression that happens to be TRUE because

    it happens to be a tautology... This warning applies to other multi-valued logics. It

    may be necessary in the future for DBMS products to be equipped with

    detection algorithms for simple tautologies of this kind."

    To my knowledge Codd never specified which tautologies he thought it "may be necessary" to support through such detection algorithms. That's a pity because it goes to the heart of an important issue. The users of any such system have a right to know in advance when it will give correct results and when not! I am not enthusiastic (and I am surely not alone) about grappling with any system where the inference rules and the behaviour of the system are not spelt out clearly so that I can understand them before I use it.

    I can't comment on Codd's hypothetical system used to avoid incorrect results because I am not aware of any description of such a system. Tom, if you know better then please provide a reference where I can read about it. In the absence of that information Codd's system is not to be trusted. It will sometimes cause incorrect results and contradictions as Codd himself freely admits.

    As regards nulls in SQL specifically, they also cause to incorrect results, examples of which have been discussed in these forums before. Some of them are much more serious than the examples Codd described in his system.

    Regarding normalization. We are talking about normalization principles which were defined in the domain of relations without nulls. Applying those same principles to a different domain (the domain of Codd's "extension" to the RM which includes tables with null marks - aka "Codd tables") is therefore a departure from the original intention of those normal forms and needs a different procedure from the original one. Codd says as much in "Missing Information (Applicable and Inapplicable) in Relational Databases" (1986):

    "... a mark in attribute C is semantically different from the db-values in C. Thus, the normalization concepts do NOT apply, and should NOT be applied, globally to those combinations of attributes and tuples containing marks. Instead, they should be applied: i) to a conceptual version of the database in which tuples containing missing-but-applicable information in the pertinent attributes have been removed; and 2) when each attempt is made to replace a mark by a db-value."

    Leaving aside the questionable wisdom of simply ignoring nulls during normalization, it is something new. It is not what we had before. What we had before were normal forms that required relations without nulls as sine qua non.

    Be that as it may, let's consider Codd's suggestion that normalization principles can be retro-fitted around nulls that they were never designed for. I can't properly do that with Codd's system of null marks and 4VL because he doesn't fully describe it. He does propose rules for comparisons of I-marks and A-marks but in the absence of his suggested solution for tautology detection I don't really see how one can comment on whether and when his system gives satisfactory results. For example what join dependencies would Codd's system support? e.g. is A |X| A = A always guaranteed to be true under Codd's system? I don't know. If I've missed some other work that explains this then I'd be interested to read it. Again, we are talking of things very different from the normalization we had in the domain of relations without nulls.

    One thing that Codd does make clear is that Codd tables may not have any candidate keys at all (p36 in the RMv2 book). Take the Codd table T with attributes A and B. Suppose that either A or B or both must be null for every tuple. Is T in 3NF with respect to the dependency A->B? According to Codd it is if the nulls in question are A-marks because we are supposed to ignore them. We are therefore only left with the trivial dependencies A->A, B->B. So there are no non-key dependencies even though T has no keys. So it seems that Codd's re-interpretation of 3NF permits tables without candidate keys.

    I don't know what Codd thinks we are supposed to do about normalization in the presence of I-marks. I never saw anywhere where he describes that. Again, I would appreciate it if anyone can enlighten me.

    It's only my opinion but it seems to me that if the new, re-invented normal forms admit tables without keys then they are not the normal forms we had before. They are something different because nulls broke the meaning of what we had before. Others can judge how useful Codd's revised normalization procedure is if it can result in tables without keys. I say it is still broken.

    My strong suspicion is that if Codd's system was more fully defined than it is then other serious inconsistencies with normalization of join dependencies would become clear. In the absence of that definition, let's look at SQL tables with nulls. "Nulls don't join" in SQL so the usual kind of join dependency A |X| B = C that includes nulls in any row presumably does not hold in SQL. I say "presumably" because the SQL language doesn't define the concept of relational equality. With SQL nulls involved it isn't obvious what comparison rules are supposed to apply so it's open to interpretation what the equality of two table expressions containing nulls really means in SQL.

    If we apply Codd's interpretation then his revised normalization procedure ignores the nulls. So we get a new kind of join dependency - one that may be satisfied even though it excludes rows from the projections being joined. This gives the same undesirable result as with Codd tables: tables without keys are permitted and normalization as I/we know it is broken.

  • David Portas (4/26/2011)


    GSquared (4/26/2011)


    My objection to this whole debate is the introduction of "authority" into it. My whole point is that citing authority is silly. Yes, I cited authority, then used it to prove something that I myself say is false (violation of 1NF). Re-read my posts, bypass emotional responses to what I wrote, you'll see that I refute my own claims in the same posts that I make them. I do this for a reason. The reason is point (D) on the above list.

    Using natural law, mathematical proof, solid laws of logic and illogic, you can prove useful things. Citing authority only proves that you can cite authorities; it's a logical flaw and is recognized as such in the sciences of debate and logic.

    Re-read what I wrote. You're objecting to the wrong things in it.

    So let's be clear about this please. You are agreeing with my view and Steven's and Chris Date's that Steven's proposed design of a table does NOT violate 1NF because by definition a relation is always in 1NF and there is certainly no order to tuples in a relation. No appeal to authority is required for you to agree with that. Is that your true position? Your responses to Steven seemed to indicate otherwise but I'm glad if we now agree.

    I say the rows in that table are "ordered" because there's a column that specifically determines the order of the rows. Look at the table design, it's right there in the DDL.

    Why do you say they aren't "ordered" by that?

    My point about authority is, "they're ordered, so they violate Date's definition of 1NF, so what?" Obviously, having a column that orders that data, so that it can be decomposed and recomposed correctly, is an engineering solution to a real-world issue, and is a perfectly valid one. Saying "there's something wrong with it because Date says you can't 'order' the rows" is recourse to authority, and completely assinine thereby.

    So, by one reading of one authority's orthodoxy on 1NF, it does violate it.

    And, honestly, the basic idea behind relational data storage is the concept that modifying one attribute of one object should result in one change in one relation to one tuple. The whole, "if I need to change Susan's last name because she got married, I should only update one row in one table" thing. This design, having a column that indicates "row 1 of the address", "row 2 of the address" does potentially violate that underlying principle. If you have 2 rows for an address, and find out that you need another row in between them, then you have to modify one of the existing rows before you can add the new row. That's a violation of the principle underlying the whole concept of Normal Forms. However, I don't think that makes it an invalid solution, if that's what you're asking.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/27/2011)


    I say the rows in that table are "ordered" because there's a column that specifically determines the order of the rows. Look at the table design, it's right there in the DDL.

    Why do you say they aren't "ordered" by that?

    Because the attribute you are talking about is only an attribute just like any other attribute in a relation. Relations are always unordered sets of tuples by definition - no matter what values you put in them. That's not just what Date says, it's what a relation is, was and always has been. A relation doesn't stop being a relation just because of what values you assign to it.

Viewing 15 posts - 76 through 90 (of 144 total)

You must be logged in to reply to this topic. Login to reply