Normalization

  • Steven993 (4/27/2011)

    I actually have Date on Database Writings 2000-2006 under my eyes.

    Chapter 8, 30 apges is dedicated to that topic.

    p127

    Definition

    A table is normalized - equivalently, it is in first normal form, 1NF- if and only it is a direct and faithful representation of some relation.

    Is that an exact quote and that sentence is his only definition of 1NF?

    If so then where did

    1.There's no top-to-bottom ordering to the rows.

    2.There's no left-to-right ordering to the columns.

    3.There are no duplicate rows.

    4.Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).

    5.All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].

    come from?

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David Burrows (4/28/2011)


    Steven993 (4/27/2011)


    David Burrows (4/27/2011)


    Your example shows violation of Codd's 1NF and is what I understood to be 1NF.

    My example, you mean AddressLine table ? It's is not a 1NF violation..

    No sorry, my bad :blush:, I should have been more specific. I was referring to

    S# CITY

    1 'London'

    2 'Paris'

    3 'Paris, Madrid'

    I've explained it. What do you don't understand ? I can't explain it better....

    To be in 1NF, we must have exactly one value for CITY's domain in tuple 3. THere is one value regarding that domain. That's in 1NF.

  • David Burrows (4/28/2011)


    Steven993 (4/27/2011)

    I actually have Date on Database Writings 2000-2006 under my eyes.

    Chapter 8, 30 apges is dedicated to that topic.

    p127

    Definition

    A table is normalized - equivalently, it is in first normal form, 1NF- if and only it is a direct and faithful representation of some relation.

    Is that an exact quote and that sentence is his only definition of 1NF?

    Of course....

    David Burrows (4/28/2011)


    If so then where did

    1.There's no top-to-bottom ordering to the rows.

    2.There's no left-to-right ordering to the columns.

    3.There are no duplicate rows.

    4.Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).

    5.All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].

    come from?

    I don't know where GSquared take this from.

    But I don't desagree with what is written here... Chris Date has writted things like that. Yes.

    But that's NOT 1NF definition !_!

    Relation != Table

  • Steven993 (4/28/2011)


    David Burrows (4/28/2011)


    Steven993 (4/27/2011)


    David Burrows (4/27/2011)


    Your example shows violation of Codd's 1NF and is what I understood to be 1NF.

    My example, you mean AddressLine table ? It's is not a 1NF violation..

    No sorry, my bad :blush:, I should have been more specific. I was referring to

    S# CITY

    1 'London'

    2 'Paris'

    3 'Paris, Madrid'

    I've explained it. What do you don't understand ? I can't explain it better....

    To be in 1NF, we must have exactly one value for CITY's domain in tuple 3. THere is one value regarding that domain. That's in 1NF.

    I do understand, probably my bad references caused the problem here, my apologies 🙂

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Steven993 (4/28/2011)


    David Burrows (4/28/2011)


    Steven993 (4/27/2011)

    I actually have Date on Database Writings 2000-2006 under my eyes.

    Chapter 8, 30 apges is dedicated to that topic.

    p127

    Definition

    A table is normalized - equivalently, it is in first normal form, 1NF- if and only it is a direct and faithful representation of some relation.

    Is that an exact quote and that sentence is his only definition of 1NF?

    Of course....

    David Burrows (4/28/2011)


    If so then where did

    1.There's no top-to-bottom ordering to the rows.

    2.There's no left-to-right ordering to the columns.

    3.There are no duplicate rows.

    4.Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).

    5.All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].

    come from?

    I don't know where GSquared take this from.

    But I don't desagree with what is written here... Chris Date has writted things like that. Yes.

    But that's NOT 1NF definition !_!

    Relation != Table

    Thank you 🙂

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David Burrows (4/28/2011)


    Steven993 (4/27/2011)


    David Burrows (4/27/2011)


    Your example shows violation of Codd's 1NF and is what I understood to be 1NF.

    My example, you mean AddressLine table ? It's is not a 1NF violation..

    No sorry, my bad :blush:, I should have been more specific. I was referring to

    S# CITY

    1 'London'

    2 'Paris'

    3 'Paris, Madrid'

    Whether it is a violation of 1NF depends on the domain for the CITY attribute. From the name of the attribute, one would expect the domain to be "names of cities" and since 'Paris, Madrid' isn't the name of a city that would imply that there is a violatiopn of 1NF here. If the domain for the CITY attribute is "text containing one or more city names separated by commas" there is technically no violation of 1NF here but it seems to me that the spirit of the definition (though not the letter) is evaded. If the domain for CITY is "any text string" and things like '#Jgf%$£=123^%zoink' can turn up as well as things like the three values shown there is certainly no violation of 1NF (neither the letter nor the spirit is violated) but the attribute naming is seriously stupid.

    That's the trouble with 1NF. When Codd talked about a simple value in a domain some took it to mean that domains like "lists of city names" were forbidden; but Codd specifically talked about complex types (although he insisted on a first order algebra - a domain could have members which were relations); and Date took it even further (allowed domains to have relations as members). This has caused an enormous amount of debate over the years, because people often feel that the atomicity principle is rendered meaningless by allowing complex types. But that atomicity principle must not be allowed to contradict the KISS principle! If I want the complex number r(cos?+isin?) why should I have to store it as two separate attributes (either modulus and phase or real part and imaginary part) and there is actually only one thing? Why should I have to store rational numbers like 0.3 as two separate attributes (numerator and denominator or significand and exponent)? Should CHAR(2) not be an allowable SQL type just because some atomicity fundamentalist thinks I should represent such attributes as two separate single character attributes?

    Tom

  • Read the entire post. There is domain definition in it.

    Of course there was debate about the word "atomicity".

    Is the VARHAR(5) string 'Hello' atomic ?

    Can we decompose it in 'H', 'e', 'l', 'l', 'o' ?

    Yes certainly you can do that if you want. but no, it's in 1NF.

  • Steven993 (4/28/2011)


    I don't know where GSquared take this from.

    Well he has told you plainly and clearly, even down to the page number, but you appear to be too lazy to turn the page. As you've just quoted from page 127, try reading from the bottom of that page on to a little way down the next page. You'll then find that you have read the passage quoted. It's right there, where Gus told you it was. Chris is saying (this is a paraphrase, I don't have the book here) "a relation is a table in 1NF; this is what it means for a table to be a relation, ie in 1NF: 1......" which quite plainly makes each of his conditions 1 to 5 part of his definition of 1NF.

    So please stop telling us that 1NF has nothing to do with what 1NF is. And stop being too lazy (or is it just too plain arrogant?) to follow up the extremely plain and clear reference you have been given, especially when you already have the book open at the page on which the passage begins.

    Tom

  • So there is no logical difference between Relation and Table ?

    What Date is saying is that :

    When you read a table. Because of it's 2 dimensional representation, you might think that there a specific order between attributes or between rows in the relation represented by that table.

    This is not a 1NF definition..

    He just took 1NF definition and created rules from it for tables. So when you read a table, you must remember those simple rules, to understand the relation behind, if you don't know what a relation is.

  • Steven993 (4/28/2011)


    Read the entire post. There is domain definition in it.

    Of course there was debate about the word "atomicity".

    Is the VARHAR(5) string 'Hello' atomic ?

    Can we decompose it in 'H', 'e', 'l', 'l', 'o' ?

    Yes certainly you can do that if you want. but no, it's in 1NF.

    Did I suggest that your original post didn't contain a definition of the domain? No.

    Did I suggest that I would support the atomicity fundamentalists? No.

    Is there a domain definition in the entire post to which I responded? No.

    Did I manage to answer Dave's question without displaying bad temper and worse manners? Yes.

    So what's the point of your ill-tempered and ill-mannered response?

    Why does everything you post increase my feeling that you are trying (but utterly failing) to emulate Fabian Pascal's style?

    Tom

  • Steven993 (4/28/2011)


    So there is no logical difference between Relation and Table ?

    There's a semantic difference. So far as I'm aware neither is a term of logic, so there can't be a logical difference.

    I suggested a definition for a relation in a given state a few posts back. Did you look at that? Do you think that defintion could imaginably be the definition of a table?

    Try not to be so boringly patronising.

    Tom

  • Tom.Thomson (4/27/2011)


    But a relational equality can easily be defined in the Codd system: two relational systems are equal if they both contain the same rows; two rows are the same if every attribute of one is the same as the corresponding attribute in the other; two attributes are the same they have the same type and [both are a-NULL or both are i-NULL or [neither is null] and they have equal values]. That definition of sameness of rows has an implication for uniqueness that Codd might have disapproved of (he wanted two attribute which are both a-NULL to be regarded as different for the purpose of evaluating constraints, including UNIQUE constraints, so pehaps including the implicit UNIQUE constraint on the complete attribute maps in collection of rows that make up a relation; but that seems to me to be clearly a dead end when looking at relation). I don't know how to define a relational equality that Codd would definitely have approved of (because it seems necessary to define row sameness to define relational equality), so yes, there's some incompleteness in Codd's work - just as there is in every scientist's or mathematician's work.

    That's been bothering me, because I should know what Codd's view was on tuple equality, so I did some digging. There's a partial answer in his "Extending the Database Relational Model to Capture More Meaning" which he submitted to ACM in March '79. It turns out that in 1979 he chose pretty much what I suggested using above (apart from the misplaced "]" - the two "]"s should have been together at the point where the second one is, otherwise the inner bracketing doesn't resolve the ambiguity or the English), so he presumably would have mostly agreed with my proposed definition or equality between relations with nulls. But that isn't 100% certain because although he begain by talking about the 2 NULL system he then said he would consider them both as just a single "missing value" NULL and use a 3VL instead of a 4VL, before going on to say when checking two tuples for equality for the purpose of considering whether one tuple was a duplicate of the other two nulls would be considered as equal; so we don't know whether in the 4VL version he would have considered a-NULL ands i-NULL equal to each other for this purpose or would have treated them as unequal (My suggestion above treats them as unequal in for that test, but I think it would be better to treat them as equal - not as much better as not having i-NULL at all would be, of course).

    Next time I'll wait before posting if I think I'm nissing something like that.

    Tom

  • I did not state my intent correctly.

    When I said Billing Address or Mailing Address I meant a separate table with an Address Type.

    I have normalize beyond the 3rd Normal Form when appropriate, so I will take the hit and I apologize for offending anyone and I would appreciate it if we could all move on and agree to disagree and respect each others differences.

    With respect to a separate table for each address line with an address line type that is a judgement decision and I did not mean to offend anyone.

    I regret some of the comments that I said but can we put our differences aside and play nice in the sandbox?

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh, I don't think you need to worry about offending anyone, you haven't done anything outside the bounds of normal conversation (except perhaps that long string of short messages when you were distracted by something, none of which were in any way offensive). If we're kicking sand at each other here it's not your fault.

    Tom

  • Tom.Thomson (4/27/2011)


    Ted Codd


    There are two options:

    1. warn users not to use tautologies as conditions in their relational-lan-

    guage statements (tautologies waste the computer's resources);

    2. develop a DBMS that examines all conditions not in excess of some

    clearly specified complexity, and determines whether each condition is

    a tautology or not.

    Naturally, in this latter case, it would be necessary to place some limitation

    on the complexity of each and every query, because with predicate logic the

    general problem is unsolvable. In my opinion, option 1 is good enough for

    now because this is not a burning issue.

    In other words he took the view that tautology detection was unneccessary at present. So that answer's your question: assume the deduction rules don't contain any tautology detection. If you have a 4VL make your language warn you clearly when the i and a truth values crop up; and if you have a 3VL make your language warn you when the u truth value turns up.

    I had always assumed that the detection mechanism mentioned by Codd would have to be applied at runtime and evaluated for each tuple involved in a query rather than just be based on static analysis of the query syntax. The reason I thought runtime evaluation was necessary was because of situations like the following one. Using the Codd table:

    z x

    ---- ----

    4 2

    0 a-null

    A restriction based on the condition that z / x = 0 would apparently return no tuples (based on RM2 page 185) even though z = 0 divided by any applicable value is known to be 0. I certainly don't share Codd's curious optimism that this is "not a burning issue" - but for the purposes of this discussion I will assume for the time being that it isn't.

    I'm prepared to consider some other 3VL system but I'm not sure exactly what you are proposing if your system is not Codd's 4VL and not SQL's 3VL. Just to restate your explanation of relational equality and join dependency. I understand that you think null marks of the same kind for the same attribute should be considered equal for the purposes of relational comparison but not for joins. So where T is a table then T = T is always true (with nulls or without) but T |X| T = T is not necessarily true. In other words a table does not necessarily satisfy even trivial join dependencies. Is that correct?

    So for example if I have some things (in the DB: some maps in a relation) such that for each thing (in the DB: map - I suppose I have to call it tuple in a RDB context although it's mathematically inappropriate) R it is true that if R.Y has a non-null value then R.Z must be inapplicable and vice-versa then that constrain in the modelled world is a functional dependency in the model - a functional dependency requiring something to be NULL. If I ignore nulls, normalisation will not eliminate non-trivial functional dependencies like this one, so I can't ignore NULLs for normalisation. If I throw away i-NULL and just have one pure NULL, I can't construct functional dependencies like this, so goodbye i-NULL and 4VL, let's get back to a single pure null and a 3VL.

    Wouldn't you want to define funcational dependency in an analogous way to how you've defined join dependency? I mean that an FD A->B is never satisfied by any table where B permits nulls? If you allow the FD A->B to be satisfied when B permits nulls then you would have FDs that were not also JDs. To comply with Fagin's definition of PJ/NF the set of join dependencies satisfied by a Codd table would have to be equivalent to the set of all key dependencies (FDs, MVDs and JDs implied by the keys of the table). So if A->B is a permissable FD where B allows nulls then by your definition of a JD you could have a Codd table which satisfied PJ/NF but not 3NF.

    Alternatively, assume the FD A->B is never satisfied if B permits nulls. Now we seem to have a situation even worse than the kludge that Codd suggested for applying normalization with nulls. Codd proposed that an FD could be in effect if you ignored only the rows where nulls occurred. However, if we insist that an FD is also a JD then by your definition of JDs it seems that a FD doesn't permit nullable attributes either. So we would have a strange situation where we can add any nullable attributes we like to a Codd table, without any risk of creating a non-key dependency and so normalization becomes a useless exercise in the face of those nullable attributes.

    I can believe that it might just be possible to devise some hypothetical system of nulls and tables with a set of derivation rules that makes sense of the normal forms we know. That hypothetical system is certainly not SQL. Nor will it be RM2 if you want 3VL instead of 4VL. In the absence of the definition of any such system (please define it if you think you can) I stand by my orginal comments on nulls, which are borne out when the consequences of RM2 are considered: None of 1NF,2NF,3NF,BCNF,4NF,5NF,PJ/NF,DKNF allow nulls. A table that permits nulls is not an accurate representation of a relation that satisfies any of those NFs.

    In the design of a database one does not normalise views. In fact it is utter nonsense to want to talk about 3NF for views

    It certainly isn't nonsense in the relational model without nulls. In the absence of nulls a view is a derived relation and a derived relation appears the same and has all the same essential properties as any base relation. Normalization is orthogonal to whether relations are base or derived and the database designer ought to be able to apply normalization to any relation schema without concerning himself about whether relations are base or not. If normalization theory is rendered useless (as I think it must be) by the presence of derived Codd tables then that is a problem caused by Codd's model of keyless tables. It is not a problem that exists in the relational model without nulls.

Viewing 15 posts - 121 through 135 (of 144 total)

You must be logged in to reply to this topic. Login to reply