Normalization

  • Steven993 (4/27/2011)


    David Burrows (4/27/2011)


    Your example shows violation of Codd's 1NF and is what I understood to be 1NF.

    My example, you mean AddressLine table ? It's is not a 1NF violation..

    David Burrows (4/27/2011)


    But I was referring to Date's 1NF and in particular to the reference

    There's no top-to-bottom ordering to the rows

    Did Date specify this?

    To what does it appertain?

    There is no debate about this point...

    A relation is a set of tuples. A set dosen't have order. That's all.

    What has been referred to as "Date definition of 1NF" so far... is not a definition of 1NF !!

    I said it at least 3 times.

    David Burrows (4/27/2011)


    But if it is a rule that is supposed to be applied to design then I would consider any attribute, whatever the type, that indicates any ordering of the rows, a violation of that rule.

    There is no such rule.

    1NF or relational theory in general, except the "atomic" value of tuples, dosen't say anything about values or their meaning...

    Date disagrees with you on the definition of 1NF. I already quoted his definition. You cited him as an acceptable authority on the subject of NULLs. Are you now asserting that he doesn't know what he's talking about when it comes to normal forms, though he does know about NULLs?

    Here's the source of his definition: http://www.dbdebunk.com/page/page/629796.htm

    You say it's "not a definition of 1NF", Date says it is an integral part of the definition. Or can you prove he didn't write that? Is that what you're saying? Either you're wrong about the definition, he's wrong about the definition, or that definition didn't actually come from him. Which is it?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • You don't know what is a relation.

    You can't see the difference between relation and table......

    I actually have Date on Database Writings 2000-2006 under my eyes.

    Chapter 8, 30 apges is dedicated to that topic.

    p127

    Definition

    A table is normalized - equivalently, it is in first normal form, 1NF- if and only it is a direct and faithful representation of some relation.

    Then he describes how to read those properties a table must follow to be interpreted as a relation.

    You don't know obviously what normalization is.

    Every relation is in 1NF. We have to start somewhere.

    1NF relations can have data redondancies, which can lead to update anomalies.

    The aim on further normal forms (2NF, 3NF, BCNF, ...) is to reduce those data redundancies, cutting a relation into 2 or more relation....

    As I already said on this topic, normalization is not, and has never been an obligation.

    You can leave perfecty with 1NF relations only if you want.

  • Steven993 (4/27/2011)


    You don't know what is a relation.

    You can't see the difference between relation and table......

    Yes, I do know these things. You don't know debate, logic, or how to defend an assertion. I appologize for using advanced techniques on a neophyte and confusing you thereby.

    (And, yes, I know that can be read in an insulting manner. I'm sorry, I don't know a better way to respond to your assinine assertions. You don't even know me, much less know what I know or don't know.)

    You added a bunch of fluff to your post after I quoted it. Your original post, entirely bereft of any meaningful content, merely an attempt to insult someone you don't even know, is intact in my citation of it.

    Now, your kindergarten teacher wants to talk to you. Or maybe she's your babysitter. (And that's only possible to read as an insult. Reap what you sow.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Steven993 (4/27/2011)


    You don't know what is a relation.

    You can't see the difference between relation and table......

    I actually have Date on Database Writings 2000-2006 under my eyes.

    Chapter 8, 30 apges is dedicated to that topic.

    p127

    Definition

    A table is normalized - equivalently, it is in first normal form, 1NF- if and only it is a direct and faithful representation of some relation.

    Then he describes how to read those properties a table must follow to be interpreted as a relation.

    You don't know obviously what normalization is.

    Every relation is in 1NF. We have to start somewhere.

    1NF relations can have data redondancies, which can lead to update anomalies.

    The aim on further normal forms (2NF, 3NF, BCNF, ...) is to reduce those data redundancies, cutting a relation into 2 or more relation....

    As I already said on this topic, normalization is not, and has never been an obligation.

    You can leave perfecty with 1NF relations only if you want.

    Thank you for changing your statement after the fact. I'm now going to go edit mine accordingly.

    You still haven't proven your position by any means. I'm going to assume at this point that you have no intention to do so, and are merely interested in defending a set of beliefs that you hold to strongly on the basis of faith.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • David Portas (4/27/2011)


    Codd states that nulls cause incorrect results in several places and he gives examples to prove it. Here's what he says in "The Relational Model for Database Management Version 2" page 183:

    "A repetition of the warning about multi-valued logics that I included in

    [Codd 1986a, 1987a] may be appropriate here. Such logics can yield the

    truth-value MAYBE for an expression that happens to be TRUE because

    it happens to be a tautology... This warning applies to other multi-valued logics. It

    may be necessary in the future for DBMS products to be equipped with

    detection algorithms for simple tautologies of this kind."

    To my knowledge Codd never specified which tautologies he thought it "may be necessary" to support through such detection algorithms. That's a pity because it goes to the heart of an important issue. The users of any such system have a right to know in advance when it will give correct results and when not! I am not enthusiastic (and I am surely not alone) about grappling with any system where the inference rules and the behaviour of the system are not spelt out clearly so that I can understand them before I use it.

    There are three points to make here: first, Codd was stating clearly that we have to cope with undecidability here - he had made a mistake with that before (assuming that it was actually decidable whether a view is or is not updatable) and was trying not to repeat that with determining what propositions were tautologies. I personally find it reasonable that when something is unknown we will have gaps in our knowledge, and I don't think a result that says "these are the rows for which the filter evaluates to true but there are also some rows which the filter evaluates to unknown" is an incorrect result. I know SQL that doesn't in any useful sense say that, but that's what a DB implementing RM2 is supposed to say. Second, the first and biggest opponent of using MVL in relational calculus, Date, stated plainly that he did not believe that for an integer X the proposition X<0 OR X=0 OR X>0 is a tautology in any 3VL, ie Codd was wrong and no tautology detection was needed or useful (see the "much ado" reference posted previously). I actually think Date was wrong on that, but he was right in that that proposition certainly is NOT a tautology in RM2's 4VL, Codd got that wrong, because it is not true but missing-inapplicable if X is the i-NULL, and since the a-NULL is actually used in the case where it is unknown whether the i-null would be correct (what a mess - I've said before that the detail of RM2's logic and NULLs had faults) it's not true but unknown (which Codd called missing-applicable, although it didn't entail applicability; that's the main cause of the faults) because we don't know whether it's true or missing-inapplicable when X is the a-NULL. And Third, when he said "in the future" in what's quoted above he meant not any time soon because there are more impotant things to worry about: to see that clearly maybe we should look at what suggested was good enough (pages 181/182 of RM v2) for handling "unknown" (as opposed to a "maybe" in the three-valued logic obtained by treating both missing-inapplicable and missing-applicable in a four-valued logic as "maybe" to get a 3-valued logic equivalent to what Codd called the 3-valued logic of RM V1; the passage you quote is clearly in the context of that logic with "maybe" instead of the 4VL, although the earlier papers he referenced in that passage are about an earlier 4VL):

    Ted Codd


    There are two options:

    1. warn users not to use tautologies as conditions in their relational-lan-

    guage statements (tautologies waste the computer's resources);

    2. develop a DBMS that examines all conditions not in excess of some

    clearly specified complexity, and determines whether each condition is

    a tautology or not.

    Naturally, in this latter case, it would be necessary to place some limitation

    on the complexity of each and every query, because with predicate logic the

    general problem is unsolvable. In my opinion, option 1 is good enough for

    now because this is not a burning issue.

    In other words he took the view that tautology detection was unneccessary at present. So that answer's your question: assume the deduction rules don't contain any tautology detection. If you have a 4VL make your language warn you clearly when the i and a truth values crop up; and if you have a 3VL make your language warn you when the u truth value turns up.

    David Portas


    I can't comment on Codd's hypothetical system used to avoid incorrect results because I am not aware of any description of such a system. Tom, if you know better then please provide a reference where I can read about it. In the absence of that information Codd's system is not to be trusted. It will sometimes cause incorrect results and contradictions as Codd himself freely admits.

    Assume no tautology detection. Assume that you have more sense than to use apparent tautologies in queries. Assume that the system will warn you when the i or a truth values are encountered when processing your query, and give you results that show you rows that divered the t truth value. I think it's pretty clear in the RM2 tome that that was Codd's intention, although he though that adding some limited tautology detection might be useful some time in the future. On that I'm pretty sure he was wrong; we don't mess about with tautology detection assuming 2VL when we are working in intuitionist logic, we sometimes add axioms to constructive set theory to do positively strange things like assert things that could be proved if we uses 2VL to see what will happen, but we don't mess about and say "because 2VL would have proved that we have missed a tautology", no indeed: the whole point of the constructive logics was to avoid the use of 2VL, both in the light of all the paradoxes and inconsistencies that were showing up in the logical foundations of mathematics around the turn of the century (from 19th to 20th) and the inherent flakiness of existence proofs using 2VL which offered no prospect of constructing the object proven to exist (a practise introduced by David Hilbert in the late 19th century, which caused the biggest controvery ever in modern mathematics).

    David Portas


    As regards nulls in SQL specifically, they also cause to incorrect results, examples of which have been discussed in these forums before. Some of them are much more serious than the examples Codd described in his system.

    Well, that's certainly something we don't disagree on. SQL NULL is a real pain, not completely useless but seriously abused by the sql standard. How can we expect programmers not to abuse it when the language abuses it by for example specifying NULL for the sum of an empty set? How can we expect people to cope with 3VL when there is practically no indication of when the u truth value is encountered? And how can we expect people to understand what's going on when they are told that NULL represents "unknown" (a truth value) rather than "not present" (a statement about the value of an attribute or variable)?

    David Portas


    Regarding normalization. We are talking about normalization principles which were defined in the domain of relations without nulls. Applying those same principles to a different domain (the domain of Codd's "extension" to the RM which includes tables with null marks - aka "Codd tables") is therefore a departure from the original intention of those normal forms and needs a different procedure from the original one. Codd says as much in "Missing Information (Applicable and Inapplicable) in Relational Databases" (1986):

    "... a mark in attribute C is semantically different from the db-values in C. Thus, the normalization concepts do NOT apply, and should NOT be applied, globally to those combinations of attributes and tuples containing marks. Instead, they should be applied: i) to a conceptual version of the database in which tuples containing missing-but-applicable information in the pertinent attributes have been removed; and 2) when each attempt is made to replace a mark by a db-value."

    Leaving aside the questionable wisdom of simply ignoring nulls during normalization, it is something new. It is not what we had before. What we had before were normal forms that required relations without nulls as sine qua non.

    I think Codd had this wrong in 1986. The i-NULL can't be just ignored. Certainly it doesn't conform to the impression I formed of his views some years after that, but I think I would disagree with with even his later views about the handling of NULLs and constraints and normalisation. But I don't think NULLs impose an insuperable problem for normalisation, and I don't think complete absence of NULLs was ever a sine qua non because all the original definitions of the normal forms (short of Date's 6NF, but including the original 6NF) carry over easily without modification provided one says that keys can't include nullable columns (if one has only a single kind of NULL which implies nothing other than that the value of the attribute concerned is not available in the current view of the database).

    David Portas

    Be that as it may, let's consider Codd's suggestion that normalization principles can be retro-fitted around nulls that they were never designed for. I can't properly do that with Codd's system of null marks and 4VL because he doesn't fully describe it. He does propose rules for comparisons of I-marks and A-marks but in the absence of his suggested solution for tautology detection I don't really see how one can comment on whether and when his system gives satisfactory results. For example what join dependencies would Codd's system support? e.g. is A |X| A = A always guaranteed to be true under Codd's system? I don't know. If I've missed some other work that explains this then I'd be interested to read it. Again, we are talking of things very different from the normalization we had in the domain of relations without nulls.

    Actually it's worse than that. In our various discussions last year you brough me very close to the conclusion that inclusion of the i-NULL and the corresponding 4VL were a mistake and the only sensible approach to NULL was a to have just one NULL with the meaning "the value is not in the database"; I've though about it quite a lot since then (and even some of Steven993's comments have helped). I've now reached the conclusion that the 2 NULL system really is a bad thing (although I can understand how Codd got to where he got) to. The thing is that certain constraints can exist in the world being modelled and ought to be cleanly expressable in the database world. So for example if I have some things (in the DB: some maps in a relation) such that for each thing (in the DB: map - I suppose I have to call it tuple in a RDB context although it's mathematically inappropriate) R it is true that if R.Y has a non-null value then R.Z must be inapplicable and vice-versa then that constrain in the modelled world is a functional dependency in the model - a functional dependency requiring something to be NULL. If I ignore nulls, normalisation will not eliminate non-trivial functional dependencies like this one, so I can't ignore NULLs for normalisation. If I throw away i-NULL and just have one pure NULL, I can't construct functional dependencies like this, so goodbye i-NULL and 4VL, let's get back to a single pure null and a 3VL.

    [/quote]

    David Portas


    One thing that Codd does make clear is that Codd tables may not have any candidate keys at all (p36 in the RMv2 book). Take the Codd table T with attributes A and B. Suppose that either A or B or both must be null for every tuple. Is T in 3NF with respect to the dependency A->B? According to Codd it is if the nulls in question are A-marks because we are supposed to ignore them. We are therefore only left with the trivial dependencies A->A, B->B. So there are no non-key dependencies even though T has no keys. So it seems that Codd's re-interpretation of 3NF permits tables without candidate keys.

    In the design of a database one does not normalise views. In fact it is utter nonsense to want to talk about 3NF for views; one of the most common uses of the JOIN operator is to rebuild something that was split up in order to make it conform to 3NF, and I hope you don't imagine that anyone would accept a restriction that they couldn't write statements (and define views) that used the join operator to produce a set of rows which did not consitute a table in 3NF.

    As for keys, the important statement towards the bottom of page 35 reads

    Ted Codd


    For each and every base R-table, the DBMS must require that one

    and only one primary key be declared. All of the values in this

    simple or composite column must be distinct from one another at

    all times. No value in any component column is allowed to be missing

    at any time.

    And indeed it is only for some views that no key is defined, never for a base table. Should every view be a relation? Should evey query produce a relation? It's potentially an interesting question. My answer would be NO - even without the presence of NULLs I may want to include duplicates (which would preclude the result being a relation). But maybe I could go for adding an occurrence count column? And anyway, I certainly want to be able to order the output from a statement, which also precludes its being a relation (a relation is a set, unordered). There's also a pretty strong statement that the system must wherever possible define a primary key on page 36, a statement which I think is silly (as such a primary key will often be utterly useless) but ought to at least mitigate your conserns.

    David Portas


    I don't know what Codd thinks we are supposed to do about normalization in the presence of I-marks. I never saw anywhere where he describes that. Again, I would appreciate it if anyone can enlighten me.

    He appears to have thought we could just ignore them. If so, he was wrong (but I think you knew that already, in fact I'm sure you knew it before I did).

    David Portas


    It's only my opinion but it seems to me that if the new, re-invented normal forms admit tables without keys then they are not the normal forms we had before. They are something different because nulls broke the meaning of what we had before. Others can judge how useful Codd's revised normalization procedure is if it can result in tables without keys. I say it is still broken.

    I think that's just FUD. There are no tables permitted without keys, as is made absolutely clear on page 35. The non-table things you are calling tables should NEVER be normalised anyway - they are views not tables. The whole schema could be in 5NF and I could perfectly valid relational operations that produced results that are not in 5NF, nor 4NF, nor BCNF, nor EKNF, nor 3NF, nor even in 2NF because that may be the output I need for my application. The principle that the result of any selection (including projections, joins, scalar operations on attributes, aggregatioions, in fact anything that can be done in the language) can be a view prevents normalisation of views beyond 1NF (and probably precludes requiring views to be in 1NF too - but I haven't bothered to look at that).

    David Portas


    My strong suspicion is that if Codd's system was more fully defined than it is then other serious inconsistencies with normalization of join dependencies would become clear. In the absence of that definition, let's look at SQL tables with nulls. "Nulls don't join" in SQL so the usual kind of join dependency A |X| B = C that includes nulls in any row presumably does not hold in SQL. I say "presumably" because the SQL language doesn't define the concept of relational equality. With SQL nulls involved it isn't obvious what comparison rules are supposed to apply so it's open to interpretation what the equality of two table expressions containing nulls really means in SQL.

    If we apply Codd's interpretation then his revised normalization procedure ignores the nulls. So we get a new kind of join dependency - one that may be satisfied even though it excludes rows from the projections being joined. This gives the same undesirable result as with Codd tables: tables without keys are permitted and normalization as I/we know it is broken.

    And SQL allows duplicates, so relational equality is going to be a bit weird even without NULL. Noone here is going to defend the strange behavious of SQL, I hope - we are after all discussing relational systems, not something which Codd and Date agreed was hopelessly non-relational.

    But a relational equality can easily be defined in the Codd system: two relational systems are equal if they both contain the same rows; two rows are the same if every attribute of one is the same as the corresponding attribute in the other; two attributes are the same they have the same type and [both are a-NULL or both are i-NULL or [neither is null] and they have equal values]. That definition of sameness of rows has an implication for uniqueness that Codd might have disapproved of (he wanted two attribute which are both a-NULL to be regarded as different for the purpose of evaluating constraints, including UNIQUE constraints, so pehaps including the implicit UNIQUE constraint on the complete attribute maps in collection of rows that make up a relation; but that seems to me to be clearly a dead end when looking at relation). I don't know how to define a relational equality that Codd would definitely have approved of (because it seems necessary to define row sameness to define relational equality), so yes, there's some incompleteness in Codd's work - just as there is in every scientist's or mathematician's work.

    Given a definition of relational equality, join dependencies (and hence the definition or 5NF) are a non-problem. If there's a non-trivial join dependency, either it's implied by the candidate keys or it isn't. We don't have to invent any new sort of join dependency, and don't forget that you are not looking at a particular join of a set of projections of a particular state of the table, you are making an assertion about joins of that set of projections which hold regardless of ther current content of the table; if it's possible for the table to hold a set of rows where the projections wouldn't join to make up the original, those projections make no join dependency. So it's really quite simple: if a column which occurs in more than one of a set of projections is nullable, that set of projections can't be a join dependency (Codd would have been completely happy with that statement, because NULL=NULL evaluates to false in the condition of a join) - and the set of join dependencies on this table are precisely the set of join dependencies on the related table which permits no NULLs restricted to those sets of projections in which no NULLable column occurs more than once. So there may be fewer join dependencies than there are on the NULL-free version, but there are certainly none that would not also be join dependencies on the NULL-free version. Now either these join dependencies are all trivial or they aren't so we know whether the table is in 5NF or not using the original definition or 5NF. And remember that RM2 does not permit nullable columns in primary keys (and hence doesn't permit them in candidate keys) so there aren't any problems from that end either.

    Tom

  • GSquared (4/27/2011)


    On the circular reason point, saying "all relations are in 1NF, thus this is automatically in 1NF because we say it's a relation", is circular reasoning. It's not proof of anything.

    It's not a statement of definition. E.g.: "Apples are, by definition fruit. Hence, anything which is not a fruit, is also not an apple". That's a definition, and thus not circular.

    There is nothing circular about saying all relations are in 1NF and that a particular description of a data structure is a relation. My implicit assumption was that you knew what a relation was and didn't need me to repeat the usual definitions here.

    A relation is a set of tuples with named and typed attributes. (Alternatively, relations are occasionally viewed as sets of tuples with ordered, unnamed attributes but that's really a matter of notational convenience - for all practical purposes as far as the relational model is concerned the attributes are normally named).

    More formally, a relation is the subset of the cartesian cross-product of a set of domains (see the first Mathworld reference in the links below)

    You can find the definition of a relation from many different sources:

    http://mathworld.wolfram.com/Relation.html

    http://mathworld.wolfram.com/Set.html

    http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf

    http://portal.acm.org/citation.cfm?id=362685

    Because a relation is a set of tuples, the tuples within a relation are distinct (not duplicate) and not ordered. They remain unordered no matter what the tuples themselves are. The fact that relations are unordered doesn't mean there's any special restriction on the tuples that a relation contains, it just means that whatever the tuples are the relation itself doesn't ascribe any order to them (because it is a set and sets are unordered).

    If 1NF is automatic for relations, are the higher NFs also automatic? If not, why? How can data, in tuples, in relations, be in violation of 5NF, and still be "relational"?

    Once you know what a relation is you just apply the definitions of the Normal Forms and you will see when they are being satisfied and when they are not. It's not especially difficult to do that and there's no need for me to go through it here. You can find examples in the Alice Book (Abiteboul, Hull and Vianu's book I cited above.

  • Gentlemen,

    You are welcome to debate this and it is interesting, but please keep it professional and agree to disagree about things that you cannot resolve. Please do not call each other names or insult someone because they do not agree with, or do not understand your position.

    Spirited debate can take place without being unprofessional. If you disagree or cannot control yourself, please step away from the keyboard. It is unbecoming of you all to take this theoretical debate to the elementary schoolyard.

  • Steven993 (4/27/2011)


    ....

    You have never read 1NF definition or even the Relation definition.

    Tht's a disgraceful statement. You owe Gus an apology.

    Steven993


    Example of 1NF definition, from the only book I have right now :

    Database Systems - The complete Book Second Edition @2009 - Jeffrey Ullman

    Page 103:

    First normal form is simply the condition that every component of every tuple is an atomic value.

    Well, he got it wrong, didn't he, even according to your next message where you say "1NF is the definition of a relation"

    Steven993 (4/27/2011)


    As we said there is no order notion in 1NF.

    1NF is the definition of a relation.

    You also claim that 1NF does not require that the tuples be unordered, but presumably you are aware that a relation is an (unordered) set?

    So Chris Date, in the passage Gus quoted (why did you claim he hadn't looked at a definition when he had given you chapter and verse?), was talking complete nonsense? Well, those of us who think he sometimes made an excellent job of doing just that might be surprised to see one clearly in the Date camp on what the relational model is taking that view, but oddly enough I would be unhappy for you to surprise us on this one as I believe Date's statement to be correct (as are many of his statements) and I don't like to see someone who cares about this stuff (even someone whos dislikes MVL and NULL) bogging down pig-headedly in an untenable position. Look, here's Chris Date's statement again:

    C.J.Date, "What First Normal Form Really Means" available here


    1. There's no top-to-bottom ordering to the rows.

    2. There's no left-to-right ordering to the columns.

    3. There are no duplicate rows.

    4. Every row-and-column intersection contains exactly one value from the applicable

    domain (and nothing else).

    5. All columns are regular .

    ...the overriding requirement, to the effect that the table must directly and faithfully represent a

    relation, follows from the fact that 1NF was originally defined as a property of relations, not tables."

    And according to you, a relations can be an ordered set of rows, not an unordered set of rows? Actually I think you got that one wrong and Chris Date got that one right. Similarly on absence of column ordering, absence of duplicates, and regularity: I guess you think those too are not essential for something to be in 1NF?

    If you can see anything in Date's statement that isn't part of the definition of a relation (and hence of 1NF) please point it out. In my view there's nothing whatever that's the least bit controversial in there, except that point 5 is open to misinterpretation as constraining the underlying physical mechanisms and people who don't know any domain theory (and Date too, although I would explect even a pale blue mathematician to know some domain theory) misinterpret 4 as banning NULL and some real idiots (certainly not including Date, or anyone else with a clue) think it means that values can't have a complex structure. Much as I disagree with some of Date's views (and think his big popular book is utterly awful, especially the latest versions of it) I'm pretty sure he got that statement of what 1NF entails absolutely correct.

    Tom

  • David Portas (4/27/2011)


    GSquared (4/27/2011)


    On the circular reason point, saying "all relations are in 1NF, thus this is automatically in 1NF because we say it's a relation", is circular reasoning. It's not proof of anything.

    It's not a statement of definition. E.g.: "Apples are, by definition fruit. Hence, anything which is not a fruit, is also not an apple". That's a definition, and thus not circular.

    There is nothing circular about saying all relations are in 1NF and that a particular description of a data structure is a relation. My implicit assumption was that you knew what a relation was and didn't need me to repeat the usual definitions here.

    Nothing circular about that. Not at all. However, that's not what was written. What was written is that "this data is a relation and thus isn't ordered because relations aren't ordered", in response to my assertion that having a column in the row that defines the order of the rows, makes it ordered. That argument, stating that I'm wrong because "I say it's a relation, therefor it doesn't have ordered rows, because relations don't have ordered rows", that's circular logic. Using an assertion to prove itself is the very essence of circular reasoning. I'm saying the data is ordered, and thus isn't a relation, the counter-argument doesn't address the ordering of the rows, it asserts that I'm wrong, "just because", in essence. THAT is what I'm objecting to.

    A relation is a set of tuples with named and typed attributes. (Alternatively, relations are occasionally viewed as sets of tuples with ordered, unnamed attributes but that's really a matter of notational convenience - for all practical purposes as far as the relational model is concerned the attributes are normally named).

    More formally, a relation is the subset of the cartesian cross-product of a set of domains (see the first Mathworld reference in the links below)

    You can find the definition of a relation from many different sources:

    http://mathworld.wolfram.com/Relation.html

    http://mathworld.wolfram.com/Set.html

    http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf

    http://portal.acm.org/citation.cfm?id=362685

    Because a relation is a set of tuples, the tuples within a relation are distinct (not duplicate) and not ordered. They remain unordered no matter what the tuples themselves are. The fact that relations are unordered doesn't mean there's any special restriction on the tuples that a relation contains, it just means that whatever the tuples are the relation itself doesn't ascribe any order to them (because it is a set and sets are unordered).

    Exactly. All of that is already assumed as a basic set of postulates in this. Thank you for explicitly including them. Nothing new here, and no objections to it. It doesn't however, address the issue about the rows having a value that inherently makes them explicitly ordered. That still hasn't been addressed.

    To clarify, either the data within a row (I won't call it a tuple here) can give the row an inherent order, violating the definitions of both "relational data" and "1NF", or that piece of the rules about rows is meaningless because there's absolutely no way for a "row" of electronic data to ever have an order, or you (or someone) needs to come up with a counter-example of how rows could possibly have an order without it being part of the data in the row. Is the rule meaningless and just "noise", because that's what the assertions thus far seem to indicate.

    The reason rows shouldn't be ordered, by their data or otherwise, is the same reason that you shouldn't store the same datum in multiple rows. It violates the whole reason for storing data relationally in the first place. An atomic update to a datum should require only an update to one row-column intersection. That's what relational data storage was created to solve in the first place. In rows that have a "1,2,3" datum to force their order, inserting a value between 2 and 3 (for example), requires updating rows outside of the atomic datum being added. To avoid that, you can use "padding" (like used to be done in old-fashioned BASIC programming, using values 10, 20, 30 instead of 1, 2, 3), or by using an extensible data type, like decimal or float, which allows for the same thing by adding values in that space (row 2.5, in this example).

    Having a "this is line 1 of the address" structure, forces atomic data modification to modify data outside of the scope of the datum, or attempting to avoid that through tricks with limited scope like padding or "partial rows" (that's what 2.5 would mean as a "this is row 2 1/2" value). Either type can run into issues, since either only allows so many inserts in between before they run out of available values and again require non-atomic data modification.

    Hence, ordered rows make that table into something other than a relation. That table violates basic rules of normalized, relational data, because the rows do have an order. That's what I'm asserting. Nothing that's been used to counter that assertion even addresses it. Either provide data on what "ordered rows" would be, and why the rule was created in the first place, or take a look at the fact that ordered rows are ordered rows.

    If 1NF is automatic for relations, are the higher NFs also automatic? If not, why? How can data, in tuples, in relations, be in violation of 5NF, and still be "relational"?

    Once you know what a relation is you just apply the definitions of the Normal Forms and you will see when they are being satisfied and when they are not. It's not especially difficult to do that and there's no need for me to go through it here. You can find examples in the Alice Book (Abiteboul, Hull and Vianu's book I cited above.

    I already do know what a relation is and what normalization is. You are assuming that my disagreement with you proves ignorance on my part. That's an assumption on your part that you have yet to even address, much less prove. You keep assuming that I'm asking you questions, which is incorrect. I'm challenging your assumptions and asking you to prove them.

    I'm not asking you to define "relational", I'm asking you to defend the proposition that "rows that have a value in them which indicates the order of the rows are not, actually, rows with an order". It sounds absurd in English, and it seems absurd to me. It violates the whole idea behind relational data stores in the first place, as outlined above. I don't see how you can justify "rows that say they have an order aren't actually rows with an order". They sure seem like it to me.

    Prove me wrong. That's all I'm asking. That's all I've been asking. I wasn't interested in insults, receiving or giving. You guys felt the need to fling them around, without justification.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/27/2011)


    I'm not asking you to define "relational", I'm asking you to defend the proposition that "rows that have a value in them which indicates the order of the rows are not, actually, rows with an order". It sounds absurd in English, and it seems absurd to me. It violates the whole idea behind relational data stores in the first place, as outlined above. I don't see how you can justify "rows that say they have an order aren't actually rows with an order". They sure seem like it to me.

    Prove me wrong. That's all I'm asking. That's all I've been asking. I wasn't interested in insults, receiving or giving. You guys felt the need to fling them around, without justification.

    The ordering you are referring to is defined by integer values of one attribute in each row in a table. Call that attribute O. Let S be the domain of all the integer values that are permitted for O. Let the relation R be the desired subset of the cartesian product of a set of domains that includes S. R is a relation by the usual definition. It is a subset of a cartesian product and consists of tuples with the values seen in Steven's example. Because R is a set of tuples those tuples are unordered. It doesn't matter that the values of one attribute of R just happen to be the same values that also define a sequence - R is still a relation because it is a set of tuples. Having now written this, it seems like making heavy work of something very simple: that a relation is the product of some set of domains. At least I hope I've clarified something.

    To clarify, either the data within a row (I won't call it a tuple here) can give the row an inherent order, violating the definitions of both "relational data" and "1NF", or that piece of the rules about rows is meaningless because there's absolutely no way for a "row" of electronic data to ever have an order, or you (or someone) needs to come up with a counter-example of how rows could possibly have an order without it being part of the data in the row. Is the rule meaningless and just "noise", because that's what the assertions thus far seem to indicate.

    It's not meaningless or noise to state that the tuples of a relation are unordered. There's no harm in pointing out a fundamental property of relations, even at the risk of repeating the obvious, especially if it helps identify possible problems when using a DBMS that is not truly relational (I mean a DBMS that uses tables that may or may not be accurate representations of relations). Singling out the fact that relations are unordered helps us spot when a system is using non-relational features. For example, if the table in question did NOT have a column with a sequence number but instead the sequence was maintained through index structures alone in an ISAM style sequential access model then a relation with the same attributes as that ISAM table would be missing the information about the sequence.

  • Tom, thank you very much for your thoughtful and thorough response to my post. For me you have made this thread worth continuing. I will mull over your reply and respond soon - but it is late just now and I'm going to bed.

  • David Portas (4/27/2011)


    GSquared (4/27/2011)


    On the circular reason point, saying "all relations are in 1NF, thus this is automatically in 1NF because we say it's a relation", is circular reasoning. It's not proof of anything.

    It's not a statement of definition. E.g.: "Apples are, by definition fruit. Hence, anything which is not a fruit, is also not an apple". That's a definition, and thus not circular.

    There is nothing circular about saying all relations are in 1NF and that a particular description of a data structure is a relation. My implicit assumption was that you knew what a relation was and didn't need me to repeat the usual definitions here.

    The problem, David, is that Steven (and maybe also you) are saying plainly and clearly that Date's definition (quoted by Gus) is wrong; but it is completely correct. An auxiliary problem is that Steven has sometimes been saying it in an extremely offensive and unpleasant manner, but with luck he'll get over that. And both of you appeared to be saying that it's perfectly OK for a relation to be an ordered set; but it isn't. What Gus wants to see is some reasoned argument pointing out why his amusing assertions about Steven's ideas contravening 1NF (actually, contravening the definition of a relation) are incorrect. Neither of you has made any attempt to provide that - just bald assertions that it's wrong because it's wrong. And then the other David asks: if that stuff doesn't contravene Chris Date's definition of 1NF (in the paper/essay/what-have-you that's in his DB Writings 2001-2006 book published by Springer Verlag as well as on the dbdebunk website and has been quoted accurately in this topic), specifically the first clause of that definition - the one about the rows not being ordered, please can he have an example of something that would contravene it (because maybe if that doesn't contravene it the statement is actually vacuous) and gets back the completely senseless and pointless answer "1NF says nothing about order" which completely ignores the Date definition that was quoted and fails utterly to address the issue. Honestly, although somewhat amusing it is also appalling and I don't know whether to laugh because it's such a farce watching Gus and David B run rings around the pair of you, or cry because someone who ought to know better is descending to this level. And when you throw in a statement that plainly suggests that there's nothing wrong with saying "a particular description of a data structure is a relation", as if a relation were some kind of "description", I have to wonder whether we are on the same planet!

    A relation is a set of tuples with named and typed attributes. (Alternatively, relations are occasionally viewed as sets of tuples with ordered, unnamed attributes but that's really a matter of notational convenience - for all practical purposes as far as the relational model is concerned the attributes are normally named).

    More formally, a relation is the subset of the cartesian cross-product of a set of domains (see the first Mathworld reference in the links below)

    Well, that's three mutually contradictory definitions right there, and the last one is utterly incorrect mathematics, containing a seriously erroneous addition to the brief but correct definition at mathworld.wolfram.com. A cartesian product is the product of a list (which can contain repeats) not of a set (which can't), otherwise the brief remark at wolfram.com about "a relation on A" would make no sense at all (and elements of a cartesian products are indeed ordered lists - tuples without attribute names - so these relations are not much like relational calculus relations). But of course the normal idea of a relation in relational algebra and relational calculus is not the simple set-theoretic relation (cartesian product, tuples with order instead of maps on a name-set) but a signature with a set of maps.

    So here's a better definition than any of them: a relation A is a signature S(A) which is a map from (some finite set of) names to the universe of domains, together with a set R(A) of maps on dom(S(A)) such that, for each r in R(A) and each n in dom(S(A)), r(n) is in S(A)(n).

    That definition enforces attribute naming with no repeated names, column-order independence, row order independence, no duplicate rows, and atomicity. It also ensures that the only thing apart from the rows is the signature (which is Date's clause 5). So it covers everything that Date covered with his version, and in my opinion covers it much more clearly and succinctly (but only for mathematicians, so it's not necessarily a good definition for non-mathematicians; which is probably why Date didn't use it - he was writing for non-mathematicians). You can add an ordering on the attributes (so that the columns have an order as well as having names, so they can optionally be addressed by number instead of by name) by adding a total order O(A) on S(A) to the definition, which is effectively what many interfaces to perverted versions of the model do (eg ADO does that for T_SQL), but I don't think that is a sensible thing to do as part of the basic definition of a relation.

    There is of course a problem there, as with all the definitions: it talks about a snapshot of a relation at a point in time; but when we talk about normalisation and hence about functional dependencies, multi-functional dependencies, join dependencies, and über-join dependencies (this last for 6NF) we are talking about constraints on what values the relation can take on, not properties of a particluar value; so the common usage when talking about normalisation is that the relation is the signature, not the content, plus of course the dependencies imposed by the requirement that it models some real world thing (because without those dependencies normalisation has no meaning).

    Tom

  • Tom.Thomson (4/28/2011)


    The problem, David, is that Steven (and maybe also you) are saying plainly and clearly that Date's definition (quoted by Gus) is wrong; but it is completely correct.

    [etc]

    And both of you appeared to be saying that it's perfectly OK for a relation to be an ordered set; but it isn't.

    [etc]

    What Gus wants to see is some reasoned argument pointing out why his amusing assertions about Steven's ideas contravening 1NF (actually, contravening the definition of a relation) are incorrect. Neither of you has made any attempt to provide that - just bald assertions that it's wrong because it's wrong. And then the other David asks: if that stuff doesn't contravene Chris Date's definition of 1NF (in the paper/essay/what-have-you that's in his DB Writings 2001-2006 book published by Springer Verlag as well as on the dbdebunk website and has been quoted accurately in this topic), specifically the first clause of that definition - the one about the rows not being ordered

    I'm so tired to repeat the same things over and over..

    When you'r not ignoring what I've written you just claim I said the exact opposite........

    Steven993 (4/27/2011)


    Create a table without any key, or generate a query that return duplicate row.

    That's not a relation. Because by definition there is no duplicates in tuples of a relation (it's a set).

    Steven993 (4/27/2011)


    David Burrows (4/27/2011)


    But I was referring to Date's 1NF and in particular to the reference

    There's no top-to-bottom ordering to the rows

    Did Date specify this?

    To what does it appertain?

    There is no debate about this point...

    A relation is a set of tuples. A set dosen't have order. That's all.

    What is the definition of 1NF ?????????????

    The definition of 1NF is something like that :

    Steven993 (4/27/2011)

    I actually have Date on Database Writings 2000-2006 under my eyes.

    Chapter 8, 30 apges is dedicated to that topic.

    p127

    Definition

    A table is normalized - equivalently, it is in first normal form, 1NF- if and only it is a direct and faithful representation of some relation.

    There is nothing about order here !!!!!!!!!

    To violate 1NF a relation must violate the Relation definition...

    I'm of course assuming you know the definition of a Relation, already described by David...

    What you are referring to Chris Date definition of 1NF is that :

    GSquared (4/26/2011)


    [

    Here's the definition I'm citing:

    According to Date's definition of 1NF, a table is in 1NF if and only if it is "isomorphic to some relation", which means, specifically, that it satisfies the following five conditions:

    1.There's no top-to-bottom ordering to the rows.

    2.There's no left-to-right ordering to the columns.

    3.There are no duplicate rows.

    4.Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).

    5.All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].

    This is not a 1NF definition.......

    1NF deals with relation..

    That is a description of rules a table must follow must be read, to be considered as a relation....

    Steven993 (4/26/2011)


    What you are citing are the properties a TABLE must follow, how it must be read to be considered as a relation.

    Address(add_id, add_Name, add_zipCode, add_City, cust_id#)

    1 Personal 75000 Paris 3

    2 Work 75000 Paris 3

    AddressLine(add_id#, line_id, line_Content)

    1 1 '16 rue de la Paix'

    2 1 '4 rue de la Plaine'

    2 2 '1st Floor, 2nd door'

    That's 2 tables representing two relations.

    What the definition of 1NF is saying is that there is no importance in the row order you see. Because a relation is a set of tuples, and a set dosen't have order.

    I am obliged to order the row, to present them as a table. But in the relation behind that table there is no order !!!

    AddressLine(add_id#, line_id, line_Content)

    1 1 '16 rue de la Paix'

    2 2 '1st Floor, 2nd door'

    2 1 '4 rue de la Plaine'

    AddressLine(add_id#, line_id, line_Content)

    2 2 '1st Floor, 2nd door'

    1 1 '16 rue de la Paix'

    2 1 '4 rue de la Plaine'

    Thoses tables above are differents tables !!!! Because obviously the row order is different.

    But they represent the SAME relation. Beacause order dosen't matter.

    This is what 1NF say about order. It has nothing to tell about the values.............

    Same goes for attributes order.

    .....

  • Tom.Thomson (4/28/2011)


    The problem, David, is that Steven (and maybe also you) are saying plainly and clearly that Date's definition (quoted by Gus) is wrong; but it is completely correct...

    I shouldn't have taken the bait and attempted (badly) to explain something that is covered in great detail elsewhere. SQL Server Central isn't the place and I am not as good a writer / teacher as Date, Abiteboul and others. I do agree that Date's definition is correct and I never intended to say otherwise. However, the example under discussion is not very interesting so I don't feel inclined to discuss it further.

  • Steven993 (4/27/2011)


    David Burrows (4/27/2011)


    Your example shows violation of Codd's 1NF and is what I understood to be 1NF.

    My example, you mean AddressLine table ? It's is not a 1NF violation..

    No sorry, my bad :blush:, I should have been more specific. I was referring to

    S# CITY

    1 'London'

    2 'Paris'

    3 'Paris, Madrid'

    Far away is close at hand in the images of elsewhere.
    Anon.

Viewing 15 posts - 106 through 120 (of 144 total)

You must be logged in to reply to this topic. Login to reply