Database Design and Reference Tables

  • Ya... my arms got a little sore from all the surfing over the weekend, and I got a bit of a tan line from my snorkel mask on Sunday. But it's just like any other burden -- you have to find a way to make it through... one day at a time, <sniff> one...day..at..a...time.

     

    TroyK

  • OK-- back on topic...

    I've reviewed some of Date's recent writings on 1NF and have concluded that the table from the article in fact does violate 1NF, but not for the reasons others have mentioned.

    I came close when I mentioned that since there is exactly 1 value at each row/column intersection for the strDescription column, the table was in 1NF (this relies on the assumption that the picture in the article is a representative sample of the data).

    However, I ignored the strCode column, which allows NULLs. Since a NULL is not a value, the column does not have exactly 1 value in that column for every row. So not only can you not have more than one value (of the particular domain) for the column in a row, you can also not have less than one. ("Exactly one"). Therefore, the table isn't in 1NF.

    In the SQL world, we've gotten used to accepting NULL, though, so it doesn't bother us to talk of tables being in 1st (and higher) normal forms even when they allow NULLs in some of their columns.

    To close, this quote on 1NF is from "An Introduction To Database Systems (7th Edition)", Chapter 11, section 3, by Date:

    "First normal form: A relvar is in 1NF if and only if, in every legal value of that relvar, every tuple contains exactly one value for each attribute."

     

    TroyK

  • Well now you've gone and started another holy war!  While I agree with Date's findings for the most part, I'm not quite prepared to accept his complete prohibition on NULLs, particularly since our current products are SQL (as opposed to relational). 

    I try to minimize their use, but since there is no workable alternative available, sometimes you have to deal with NULL. 

    I know that I'm allowing physical implementation to dictate logical design, but until the TransRelational DBMS becomes available and if it fulfils its promise, we really have no choice.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I'm pretty close to agreeing with you...

    If we accept NULL as a "value" that has some kind of meaning, or that we can at least "deal with", and we assume that the ordering of the columns and rows aren't important, then we might accept that in some sense a SQL table is a faithful representation of a relation... or at least as close as we're going to get with our current products.

    From that perspective (which is where I was coming from originally), the article's table is [probably] in 1NF. It's just that, as stated earlier, we're dealing with an overly broad domain for the Description column, namely, something like "the set of all values that are rendered for display as choices within a particular application's interface".

    TroyK

  • What columns unqiuely identify each row of the generic lookup table?

    Per the implementation in the article I can add the row (16, 'CA', 'CALIFORNIA', 'Postal.State') even though the row (16, 'CA', 'CALIFORNIA', 'Customer.State') already exists.  I'm now storing the same information multiple times in the same database - in the same table even.  Talk about your potential for "data drift."

    Additionally I can add (28, NULL, 'CALIFORNIA', 'Office.State') to the same table.  I can also add 1,000 other rows for "CALIFORNIA".

  • As long as we're beating a dead horse...

    Since the example in the article doesn't provide the DDL, you might assume that the key has been (or could be) defined across strDescription and strCategory (who names their columns like this?).  In all likelyhood, those who use MUCKs are going to declare the key on the intID column, but let's just pretend that this person knows enough to at least declare a unique constraint on the real key...

    If that's the case then you also have to assume that there is a valid reason (we're still pretending here) for wanting to maintain the customer.state separate from postal.state.  In which case, the two categories would be stored in separate tables in a properly designed database and you could have 'California' in each one.

    This just highlights another logical problem with MUCKs.  If a "code" in a real database truly can be applied to multiple entity types, then it's easy enough to declare multiple FK relationships.  In a MUCK that has a category column it is very messy to accomplish the same thing.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • As long as we're beating the dead horse, might as well break out the shotgun:

    I'm thinking along similar lines as you with the declaration of the primary key.  We obviously can't use strCode as a Primary Key or part of a primary key (nullable).  Ditto on the naming conventions - VB6 programmers would be the only people I could think of who would name their columns like this - because of an MS aberration of the so-called 'Hungarian notation'.

    But anyway, if we declare the PK as (strDescription, strCategory), we're using the data itself as a PK.  Sort of defeats the purpose of a PK and any potential optimizations it might provide.

    And I agree that there's no reason to store the state information multiple times; although with this type of design you can expect a programmer to add the same attributes into the table multiple times rather than try to hunt through the table for existing codes that might be applicable.  If you are listing States that have offices for your company - as opposed to a complete listing of all states - you will duplicate the data over and over using this method.

  • Since this board's quote feature only shows the first part of the post, here's the part I'm addressing:

    "But anyway, if we declare the PK as (strDescription, strCategory), we're using the data itself as a PK.  Sort of defeats the purpose of a PK and any potential optimizations it might provide."

    This looks like a confusion of logical-physical (the result of SQL implementing PK's as physical indexes). As a logical construct, a PK has nothing to say about performance, and nothing but the data [loosely speaking!] could inform any of the candidate keys which may be designated as the PK. (I may be misunderstanding -- if by "data" you are referring specifically to the strDescription column and no other, then I see your point).

    When we talk about the index SQL uses to implement the PK, though, then it may be time to be concerned about the size of the columns involved.

    By the way, Don's point about the difficulty of assessing the design in the absence of the actual DDL for the table is a good one. Has everyone noticed how often we have to use the word "assume"?

     

    TroyK

  • I was going to ask Mike to clarify that comment too, because if I read it correctly, it betrays a pretty serious misapprehension of both the logical need for keys and the physical implementation of them.  But I wanted to give him the benefit of the doubt.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Sorry about the confusion.  Yes I am talking about using the column strDescription as the PK, or even as part of the PK.

    If we imagine a single PK in the generic reference table as being ('CA', 'CALIFORNIA'), then relating it to another table we have to include both columns in the FK; to me this defeats the purpose of the PK/FK system.  Might as well go all the way and not even create a lookup table...  If you're repeating 'CA', 'CALIFORNIA' in two columns of every California address in an Address table, what's the point of even having a reference table that tells us 'CA' is the code for 'CALIFORNIA' (possibly telling us this multiple times, as well as telling us that 'CA' is the code for 'CANADA' and 'CA' stands for 'COMPUTER ASSOCIATES', all in the same table)?

    When I mentioned "optimization" (admittedly not the best choice of words since people automatically assume speed, storage and other compiler "optimizations"), I was thinking in terms of optimal logical design.  Basically what normalization provides by removing repetitive data, transitive dependencies, etc. from your database design.  The physical side effects in some cases *might* include physical speed, indexing, query compilation and/or storage space improvements as well; though (AFAIK) physical optimizations are not necessarily a requirement of the ANSI SQL Standard.

  • Sorry for the misunderstanding.  As I pointed out in another post, if you are using the strDescription column as any part of the PK, then you will be duplicating the strDescription column in referenced tables under the PK/FK system.  Repeating your data in tables throughout the database pretty much defeats the logical need for keys and reference tables; if you're going to do that, you might as well revert back to COBOL or flat files and call it a day.

    Basically if you're going to abolish the "relational" from your "relational database" by finding ways to circumvent the benefits of the system, you may as well use another data storage and retrieval system.

  • I think of this as an example of a FoxPro database programmer that thinks he has graduated to SQL Server.  While it will work for small implementations and in small minds, this is REALLY bad design. PERIOD.

    Personally I vote for going to 5th normal.  🙂

    I do that and then back off, maybe to 3rd, but not less than that.  Unless of cource, I am working on a data warehouse/mart project.

     

    Remember it is always easier to denormalize through views than it is to normalize a bad structure.

  • My first issue:

    I often use lookup tables for more than one other table . What would I put in the strCategory column? I would need a child table for multiple categories.

    My second was covered in ...madness.asp (maybe my first was covered in other posts. I didn't read ALL of them.)

  • What kind of category?

    Depending on how they are used, you can use a lookup across multiple tables.  Category for example could have 3 colums and be related to another table, I.E.

    tlkpCategoryGroup           tlkpCategory

    CategoryGroupID             CategoryID

    CagetoryGroupDesc         CategoryGroupID

                                       CategoryDesc

    Then you can limit what it lists for the multiple uses of it.

  • The "single lookup" may or may not make sense in anyone's particular situation.  What stuns me is that it has become a "hot issue" - yet 90%of the database programmers I meet haven't got even a passing familiarity with relational database theory and practice.  After 25 years of it, let me point out you can always choose to ignore theory once you know it.

    There is no one solution to normalizing data, and this may be right for some places.  But it might be an interesting excercise to take it all the way to sixth normal form (which you'd never place in production) and then recombine to fifth - since there is probably more than one way to do it, and this will tend to reveal more possibilities as you take the totally decomposed objects and then make sensible objects from them.

    Please, don't forget to read your Codd & Date, and Stonebreaker, etc, while we are reading all the books we need just to "get the work done".   You WILL make better decisions in design.

    But why are we even argueing about consolidated lookup tables?  Doesn't make much sense to me, but I don't know your data.

    Roger L Reid

Viewing 15 posts - 76 through 90 (of 101 total)

You must be logged in to reply to this topic. Login to reply