An Introduction to Database Design

  • kenambrose (1/22/2011)


    Hi Paul,

    What I try to do when I train beginners in DB work is to emphasize Codd's fundamental logical concepts- in layman's language.

    I start out with the basic concepts such as modeling of data processing, aligning a model of the data to the business meaning of real world processes and things (duplicates never exist in the real world so preventing them is a basic step in recording facts about the world) , why we seperate out complex business facts into seperate record keeping "entities", and "what are the common record keeping problems that relational principles address and how does it try to address them" , etc.

    I think it's a different opinion - from a trainers standpoint - of what's important for beginners to start with...

    I would imagine that the first thing you would do is explain the concept of entities and tables to hold them in a normalized form, just like Paul did. I would imagine the next thing you'd teach would be the concept of uniqueness, data integrity, and then DRI... but you have to start somewhere... just like Paul did.

    Like you (and Paul, believe it or not), I agree on the unique thing being important. I believe that'll make a fine second article. But, like I said, you have to start somewhere and I believe Paul's article laid an excellent foundation for additional teaching. Give the man a chance to follow up with the next important step... just like you would. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • <<I would imagine that the first thing you would do is explain the concept of entities and tables >>

    Actually, no, not at all. The first thing I do is give a little background as to the principles of modeling at a higher level of abstraction. Why it is so advantageous to abstract models from the real world, what are some of the limitations and benefits of such modeling, how is the concept beneficial in other domains?

    Then I talk about some of the principles of data modeling specifically. How businesses processes historically handle tremendous amounts of information but it is stored in individual people's heads and therefore not widely available to the organization at any point in time, easily lost, misunderstood when communicating with others due to lack of standardardization for semantics and terms, and lost over time as personnel changes.

    Then we can consider the limitations - again at a high level - of using computer record keeping systems. How it is simply impractical to capture ALL the knowledge in worker's heads due to space and resource limitations, how therefore we must "reduce" or identify the truely and minimally "essential" business facts that should be recorded, why the fact recording needs to be structured with a bit of mathematical rigor (thank you Dr. Codd), etc.

    No reference to entities or tables in the above. It is WAY premature...

    The above concepts are in my opinion the starting point for fundamental concepts of database. From that emphasis beginners start to view the problem in a more abstract and conceptually sound way, leading to many practical benefits when it comes time to create a model.

    It really is I think a different emphasis and consideration of what is good training for beginners. Maybe that comes from my actual training experience. Since I would have a lot of people in a class, the training was never about any one indvidual's fact domain. Too narrow, and again, misses the general principles.

    Having said that, Bob's story - as a documentary of how NOT to do DB design, certainly reflects a common reality. But it is a documentary, not a sound tutorial for beginners of how to approach this work in my opinion.

    Maybe that is where the disjoint is in the discussion. Is it a good tutorial/training document, or is it simply a documentary of poor practice that is quite common? You be the judge...

    p.s. It is not rocket science, please don't tell me the approach I recommend is too theoretical or complex for beginners. Just the opposite in my opinion, it is the sound and fundamental conceptual foundation for effective future DB work at any level.

    And I still stand by my earlier statement, the fact that almost NO ONE in this discussion recognized the data anomalies that would be generated should the suggested model be implemented. People who often build databases in our industry just don't get Codd...

    ken

  • I think the problem here is that there is no possible way to teach every single database design consideration on a single article. It's even hard to do in single book.

    I could easily point out numerous issues with the proposed database design. I work in this area and know that a design that supports retail transactions, over time, in any jurisdiction, in any legal environment, for any type of product or service requires hundreds of tables and thousands of columns.

    But I have a magic tool, called context, that lets me consume the information in a way that fits within the goals of the article. Sure, it means that I get to cringe at the address and name standards not being international. Sure, I'd love to impress others with how sales tax blows the whole simple design apart.

    One of the best tools an architect can have is to understand how eliciting requirements means not trying to eat the whole enchilada in one bite. It also means that I know that one can use iterations to finalize a design.

    The very best thing about this article is that while it brought up the concept of normalization, it did not focus on normal forms. It focused on *some* of the benefits of normalization without bogging down the discussion in a way that makes the vast majority of people run to watch Jersey Shore to feel better about their lives.

    Context is one of the most important tools an architect needs. Architects without context skills will find it very difficult to meet the needs of a wide audience of team members they work with.

  • kenambrose (1/22/2011)


    <<I would imagine that the first thing you would do is explain the concept of entities and tables >>

    Actually, no, not at all. The first thing I do is give a little background as to the principles of modeling at a higher level of abstraction. Why it is so advantageous to abstract models from the real world, what are some of the limitations and benefits of such modeling, how is the concept beneficial in other domains?

    Then I talk about some of the principles of data modeling specifically. How businesses processes historically handle tremendous amounts of information but it is stored in individual people's heads and therefore not widely available to the organization at any point in time, easily lost, misunderstood when communicating with others due to lack of standardardization for semantics and terms, and lost over time as personnel changes.

    Then we can consider the limitations - again at a high level - of using computer record keeping systems. How it is simply impractical to capture ALL the knowledge in worker's heads due to space and resource limitations, how therefore we must "reduce" or identify the truely and minimally "essential" business facts that should be recorded, why the fact recording needs to be structured with a bit of mathematical rigor (thank you Dr. Codd), etc.

    No reference to entities or tables in the above. It is WAY premature...

    The above concepts are in my opinion the starting point for fundamental concepts of database. From that emphasis beginners start to view the problem in a more abstract and conceptually sound way, leading to many practical benefits when it comes time to create a model.

    It really is I think a different emphasis and consideration of what is good training for beginners. Maybe that comes from my actual training experience. Since I would have a lot of people in a class, the training was never about any one indvidual's fact domain. Too narrow, and again, misses the general principles.

    Having said that, Bob's story - as a documentary of how NOT to do DB design, certainly reflects a common reality. But it is a documentary, not a sound tutorial for beginners of how to approach this work in my opinion.

    Maybe that is where the disjoint is in the discussion. Is it a good tutorial/training document, or is it simply a documentary of poor practice that is quite common? You be the judge...

    p.s. It is not rocket science, please don't tell me the approach I recommend is too theoretical or complex for beginners. Just the opposite in my opinion, it is the sound and fundamental conceptual foundation for effective future DB work at any level.

    And I still stand by my earlier statement, the fact that almost NO ONE in this discussion recognized the data anomalies that would be generated should the suggested model be implemented. People who often build databases in our industry just don't get Codd...

    ken

    I guess we'll simply have to agree to disagree at this point. Thanks for the feedback, Ken. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have made a number of small changes to the published text based on comments received so far. Thanks to everyone for their opinions and insights.

  • kenambrose (1/22/2011)


    <<I would imagine that the first thing you would do is explain the concept of entities and tables >>

    Actually, no, not at all. The first thing I do is give a little background as to the principles of modeling at a higher level of abstraction. Why it is so advantageous to abstract models from the real world, what are some of the limitations and benefits of such modeling, how is the concept beneficial in other domains?

    Then I talk about some of the principles of data modeling specifically. How businesses processes historically handle tremendous amounts of information but it is stored in individual people's heads and therefore not widely available to the organization at any point in time, easily lost, misunderstood when communicating with others due to lack of standardardization for semantics and terms, and lost over time as personnel changes.

    Then we can consider the limitations - again at a high level - of using computer record keeping systems. How it is simply impractical to capture ALL the knowledge in worker's heads due to space and resource limitations, how therefore we must "reduce" or identify the truely and minimally "essential" business facts that should be recorded, why the fact recording needs to be structured with a bit of mathematical rigor (thank you Dr. Codd), etc.

    No reference to entities or tables in the above. It is WAY premature...

    The above concepts are in my opinion the starting point for fundamental concepts of database. From that emphasis beginners start to view the problem in a more abstract and conceptually sound way, leading to many practical benefits when it comes time to create a model.

    It really is I think a different emphasis and consideration of what is good training for beginners. Maybe that comes from my actual training experience. Since I would have a lot of people in a class, the training was never about any one indvidual's fact domain. Too narrow, and again, misses the general principles.

    Having said that, Bob's story - as a documentary of how NOT to do DB design, certainly reflects a common reality. But it is a documentary, not a sound tutorial for beginners of how to approach this work in my opinion.

    Maybe that is where the disjoint is in the discussion. Is it a good tutorial/training document, or is it simply a documentary of poor practice that is quite common? You be the judge...

    p.s. It is not rocket science, please don't tell me the approach I recommend is too theoretical or complex for beginners. Just the opposite in my opinion, it is the sound and fundamental conceptual foundation for effective future DB work at any level.

    And I still stand by my earlier statement, the fact that almost NO ONE in this discussion recognized the data anomalies that would be generated should the suggested model be implemented. People who often build databases in our industry just don't get Codd...

    ken

    Based on the length of the article and your description above, you also wouldn't have gotten to the concepts that you're complaining are missing.

    I think you're missing the point that the audience of this article and the audience that you'd be training are very different groups. If you hit the average business user with your training, they're not coming to the second session. Most will find an excuse to walk out of the first session or simply check the web with their phones while you're talking. Your training is fine for people who want to be DB professionals, I'm sure, but the average business person doesn't want to "get Codd" or sit through all theory. Try to make them and you'll have ensure they'll stick with Excel.

    The way this article approaches it, showing them improvements in what they're doing now will get them hooked and you can bring them along little by little. You have to show them why this will be better for them right now or you lose them. They don't want to be DBAs or designers, they want a tool that will help them do their real job. If you don't know your audience, you won't have an audience.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • SQLkiwi (1/23/2011)


    I have made a number of small changes to the published text based on comments received so far. Thanks to everyone for their opinions and insights.

    You've folded the changes in quite nicely, Paul.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • << If you hit the average business user with your training, they're not coming to the second session.>>

    Not been my experience, and I just finished a contract at SLAC that proves your statement wrong.

    Now having said that, I will say SMEs initally usually don't *want* to start where I start them (considering what abstraction is about in general). But I usually nip that resistance in the bud by explaining to them the likely consequences - with examples - of getting abstraction *wrong*.

    You're other point is too silly to merit a response...

  • kenambrose (1/24/2011)


    << If you hit the average business user with your training, they're not coming to the second session.>>

    Not been my experience, and I just finished a contract at SLAC that proves your statement wrong.

    Single data points can't prove assertions of general trends wrong.

    You're other point is too silly to merit a response...

    When you understand why it's not silly, you'll understand why many people in this thread disagree with you.

    I think you'll find that there's been a degree of selectivity at work here -- on all sides. When you choose which contracts to go for, when the company decides to go with you or with someone else, when you explain your idea of the proper way to explain database design, all of these are filtering points that narrow down your interactions. So of course your experiences will for the most part reflect your world view. Same goes for everyone else here. I find it's useful, absent of objective data, to apply an equality presumption. So when one person says the average business person is OK with being taught theory, and someone else says the average person will leave/fall asleep/pay token attention during the theory part, I assume those are generally equal.

    But when one person says A and several others say B -- again, absent of objective data -- the most likely answer is B.

    Coming back to the practical discussion, I can tell you now that I have known, met, and trained a good deal of people who would not learn the theory before the practice -- and I've had good success injecting the theory into the practice as I was teaching it. Yes, this meant something similar to this article, allowing them to create databases which had flaws. But we went over those flaws in later sessions, and instead of just learning that they were wrong, they learned why they were wrong. They weren't just told the consequences of bad theory, they saw them first-hand.

  • kenambrose (1/24/2011)


    You're other point is too silly to merit a response...

    Not need for that type of attack. Stick to the subject. Either respond to a point or not. Keep the personal attacks out of it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • kenambrose (1/24/2011)


    << If you hit the average business user with your training, they're not coming to the second session.>>

    Not been my experience, and I just finished a contract at SLAC that proves your statement wrong.

    Now having said that, I will say SMEs initally usually don't *want* to start where I start them (considering what abstraction is about in general). But I usually nip that resistance in the bud by explaining to them the likely consequences - with examples - of getting abstraction *wrong*.

    If they're told they have to go, then they have to go. Doesn't prove they want to be there.

    You're other point is too silly to merit a response...

    You mean my comment about how you wouldn't have gotten to your point in an article of the same length? You just must not be very good at explaining things then because you sure don't hit that point in your first few paragraphs about how you would explain things to beginners.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Jeff I made no personal attack. I said the *point* was silly, not the person. And it is. Trying to tell me that *if I had written an article, I would have made such and such a mistake* is just a plain silly statement, no other word for it.

  • kenambrose (1/25/2011)


    Jeff I made no personal attack. I said the *point* was silly, not the person. And it is. Trying to tell me that *if I had written an article, I would have made such and such a mistake* is just a plain silly statement, no other word for it.

    I didn't say you'd have made any mistake, I said that given the length of the article and your description of how you'd go about it you wouldn't get to your point in the first article either. I was pointing out the length of the medium and how even in your description you'd either need a longer article or a second article to get to the point you're so insistant upon including in the first installment.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Then I misunderstood, I stand corrected, and I need to apologize to you for considering that you were just making some silly projection of what I would do.

    The point about length / effort needed is not silly at all, and it actually is an important one, which I considered raising earlier anyway.

    That is: Some things it just takes a certain amount of effort to be competent at. If you don't want to put in the effort, then you can't expect to do a competent job. And I absolutely DO explain this in my JAD sessions to the SMEs. It definitely comes up. I make the point. It takes a certain amount of serious intellectual effort to do this work properly. If you don't have the time or don't want to make the effort, don't expect results you will be satisfied with.

    Now it takes a lot of confidence to say that to department director's at the first couple JAD sessions. But I do. And the reason I do is because I explain that if we don't put in the intellectual effort, the likelihood they will be disappointed, or worse, have to REVISIT the problem again later, is very high.

    And if there is one thing most bueracrats dont want to do, it is REVISIT a problem in 6 months and rework.

    And, I have also had department directors who were awash in scattered, inconsistent, redundant, and painful spreadsheets spend their *weekends* developing conceptual models of their domains. Because they GOT it. About abstraction and the importance of doing it right.

    And the relief and satisfaction on their faces is a joy to behold when they finally have a model they feel is accurate, that they can communicate to their stakeholders, and that they have confidence in, for going forward with the rest of the work.

    I fully stand by my earlier points. This working is fundamentally about learning what abstraction is, how to do it properly for the needs of data modeling. That is the optimal starting point for beginners in my opinion.

  • kenambrose (1/25/2011)


    Now it takes a lot of confidence to say that to department director's at the first couple JAD sessions.

    I think what you're missing, and why people are arguing with you is that this is the first session. This article is getting started on design, with more parts coming.

Viewing 15 posts - 61 through 75 (of 125 total)

You must be logged in to reply to this topic. Login to reply