Modeling Design / Approach

  • gyoung 93471 (3/21/2012)


    Given that I don't see significant database operations on buildings and floors, I wouldn't isolate them.

    Therefore, two tables work nicely and cleanly.

    1) create a table for rooms (room_id, room_name, building_name, floor)

    2) create a table for computers (computer_id, computer_detail {several columns for this}, user_id {default:0}, room_id {NOT NULL - receiving room?})

    Why would I be wrong (besides being too lazy/illiterate to write the sql)?

    Well... just because you are ignoring basic data modeling, data normalization rules?

    Think about this... the three-thousand rooms "Montecitos" hotel in Las Vegas... do you really want to repeat "Montecitos" three-thousand times? one for each room? not to mention somebody may type the name as "Montecito" or "Montesitos".

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Paul,

    I greatly appreciate your answer as I am trying to learn - to make sure I do things in a good way. Do the guidelines (rules) allow flexibility? For example, most places I've worked that have multiple buildings only have two or three and the number of buildings and floors was more or less static. This is why I wrote for my design that I foresaw "no significant changes to building or floors." The database we are designing here is really for the computers (at least to my understanding) and I would assume the architecture of it would not have to satisfy both my two building 100 person company and Tom Selleck's Monticeto (with many buildings and thousands of rooms overlooking swimming pools). Also, for the database I "designed", there are ways to ensure the consistency of building and room names that should be there regardless of the layout of the tables. From my experience, there just isn't an issue.

    Or is the issue that good database design is always scalable and always follows the rules?

  • It is about normalization and repeating data is a clear indication of under normalization (even if you have ways to ensure the data is consistent).

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • gyoung,

    There is something we call best practices which basically means we do stuff in certain ways that have proven over time to be the most efficient ones in terms of design, performance, scalability and maintenance.

    When designing an OLTP database, at the data modeling phase, we build an ER Model then, based on such a model we build a physical one. During the process of building the physical model we normalize the data structure to a certain level.

    The design you are proposing appears to be a mix of 1NF and 2NF, not totally sure because I do not have access to the complete model. Problem is, it is usual to go not lower than 3NF (Third Normal Form) for a database expected to support and OLTP system.

    Your note about no much activity on certain table (buildings?) shouldn't really affect the way you approach data modeling; we call these "low volatility" tables which in extreme cases are static tables - nevetheless we apply data modeling rules no matter the volatility of the underlying tables.

    Hope this helps.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Paul Berzukov,

    Thank you again for your helpful responses (and sorry it took me awhile to get back). I can see it's going to take me some time before I see the necessary advantages of always strictly following normalization to 3NF - I assume the advantages are in building queries and using tools. I think best practices are important to understand and indeed I want to know what they are. For the example I gave (and I feel it is a good example, need wise, of a real-world simple database for keeping track of computer information), I do not see it having any redundant information nor having any downside except that it doesn't adhere to the established best practices for OLTP. Yet I can see that splitting it into one more table in order to satisfy 3NF doesn't have to cramp my style either - there isn't any disadvantage to following best practices. Except that, perhaps, I could end up with 30% (or some percentage) of additional tables to worry about - which could be a lot of tiny tables in a bigger database and schema. As I programmer (not so much with databases), I've always hated code that had a million nebulous function calls to tiny functions. I rarely separate out a small section of code that isn't going to be used elsewhere just for the sake of modularization as many do. I like code that is right in front of me. I'll have to trust that database programming is different.

    Scalability is an aspect of databases that I am highly interested in learning about. I know it can make or break a database. Modifying an existing database structure doesn't necessarily have to "break" the database either. I would think one can be ok as long as the table structure is fixable to make it scale when needed. In our computer database (with room & building) example, one could start with a building table (for scalability) even if the company had only one tiny building, just because ten years from now the company might lease an additional tiny building. Perhaps I have everything to learn (?) about scalabilty because in our example I don't see any real-world scalability advantage of having the separate building table (should I?) or not unless there are more than a couple buildings to start with.

    Gary Young

  • gyoung 93471 (3/28/2012)


    Paul Berzukov,

    Thank you again for your helpful responses (and sorry it took me awhile to get back). I can see it's going to take me some time before I see the necessary advantages of always strictly following normalization to 3NF - I assume the advantages are in building queries and using tools. I think best practices are important to understand and indeed I want to know what they are. For the example I gave (and I feel it is a good example, need wise, of a real-world simple database for keeping track of computer information), I do not see it having any redundant information nor having any downside except that it doesn't adhere to the established best practices for OLTP. Yet I can see that splitting it into one more table in order to satisfy 3NF doesn't have to cramp my style either - there isn't any disadvantage to following best practices. Except that, perhaps, I could end up with 30% (or some percentage) of additional tables to worry about - which could be a lot of tiny tables in a bigger database and schema. As I programmer (not so much with databases), I've always hated code that had a million nebulous function calls to tiny functions. I rarely separate out a small section of code that isn't going to be used elsewhere just for the sake of modularization as many do. I like code that is right in front of me. I'll have to trust that database programming is different.

    Scalability is an aspect of databases that I am highly interested in learning about. I know it can make or break a database. Modifying an existing database structure doesn't necessarily have to "break" the database either. I would think one can be ok as long as the table structure is fixable to make it scale when needed. In our computer database (with room & building) example, one could start with a building table (for scalability) even if the company had only one tiny building, just because ten years from now the company might lease an additional tiny building. Perhaps I have everything to learn (?) about scalabilty because in our example I don't see any real-world scalability advantage of having the separate building table (should I?) or not unless there are more than a couple buildings to start with.

    Gary Young

    Developing the data model to 3rd normal form helps to understand the problem domain and eliminates most (not all, that is why there is BCNF, 4th, 5th, 6th normal forms, and maybe others) of the issues you may developing an appkication.

    In true third normal form, for instance, a Sales Order would not have any customer information in the order header record except the Customer Number (the link betwen Customer and Order), nor would the Order Line have any invormation regarding a product except the Product Number from inventory.

    In reality, however, you would store this information in the Order Header or Order Line Item as you would want to know this information at the point in time that the order was taken. This is the denormalization that takes part as you make design decisions moving from a logical or conceptual design to a physical design.

    This is why most developers talk about designing to 3rd normal form then denormalizing where it is needed. It is a design decision and represents a duplication data, and you have to decide how you will handle it. Perhaps you don't allow changes to customer information in the order header for one using the example I used.

  • Lynn Pettis (3/28/2012)


    In true third normal form, for instance, a Sales Order would not have any customer information in the order header record except the Customer Number (the link betwen Customer and Order), nor would the Order Line have any invormation regarding a product except the Product Number from inventory.

    Let's be clear about this though. "True" third normal form does not in any way imply that you can't record the customer information that was current when an order was placed. Satisfying normal form just requires (roughly speaking) that you decide what dependencies you want to represent and that you ensure those dependencies are implied by keys and nothing but the keys in your database.

    I wouldn't recommend denormalizing below 3NF without very good reason. In the example mentioned by Lynn, denormalization surely is not going to help you record accurately the history of changing customer data. Rather than denormalizing, it would be more prudent to decide what determines the customer details (for example, is it really order or is it actually the customer number plus the datetime?) and let that inform the decision about where the customer details should go and what the key(s) of those tables should be.

  • sqlvogel (3/29/2012)


    Lynn Pettis (3/28/2012)


    In true third normal form, for instance, a Sales Order would not have any customer information in the order header record except the Customer Number (the link betwen Customer and Order), nor would the Order Line have any invormation regarding a product except the Product Number from inventory.

    Let's be clear about this though. "True" third normal form does not in any way imply that you can't record the customer information that was current when an order was placed. Satisfying normal form just requires (roughly speaking) that you decide what dependencies you want to represent and that you ensure those dependencies are implied by keys and nothing but the keys in your database.

    I wouldn't recommend denormalizing below 3NF without very good reason. In the example mentioned by Lynn, denormalization surely is not going to help you record accurately the history of changing customer data. Rather than denormalizing, it would be more prudent to decide what determines the customer details (for example, is it really order or is it actually the customer number plus the datetime?) and let that inform the decision about where the customer details should go and what the key(s) of those tables should be.

    Well in truth if you include the customer's current data at the time of the order in the order table you can't possibly be in 3NF because it is a direct violation of 2NF (Eliminate Redundant Data). That is why Lynn says this is a typical and usually very good type of thing to denormalize. It makes sense that you would want to have this historical information but it is no longer 3NF.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (3/29/2012)


    Well in truth if you include the customer's current data at the time of the order in the order table you can't possibly be in 3NF because it is a direct violation of 2NF (Eliminate Redundant Data). That is why Lynn says this is a typical and usually very good type of thing to denormalize. It makes sense that you would want to have this historical information but it is no longer 3NF.

    2NF does not mean "eliminate redundant data". Even if it did, I don't think it's the issue here. 2NF means that there are no partial key dependencies on nonprime attributes. It's not obvious to me why having the customer data from the time of the order in the order table would lead to a partial key dependency on a nonprime attribute in that table. If it did then I certainly wouldn't put that data there. My point was precisely that I would want to record the customer's data from the time of the order in a normalized way (e.g. BCNF or 5NF) rather than a denormalized way.

  • Maybe I wasn't clear.

    Simplified, 3nf:

    Customer

    ----------

    CustomerNumber

    CustomerName

    CustomerAddress

    CustomerCity

    CustomerState

    CustomerPostalCode

    ...

    OrderHeader

    -------------

    CustomerNumber

    OrderNumber

    OrderDate

    ...

    OrderLine

    ----------

    OrderNumber

    OrderLineNumber

    ProductNumber

    ...

    At this point, we have normalized the data and identified the attributes needed to link data together between the tables.

    Moving forward to a physical design, we need to capture information at that point in time, here we denormalize our data known that we are duplicating data. On the data an order is made a customer will have a specific address. Three months later (or some other time frame), the customer may move. For historical reasons we don't want this order to reflect the new addresss, as that is not where the customer was at the time of the order.

    Denormalized (2nf, capture point-in-time info)

    Customer

    ----------

    (same as 3nf)

    OrderHeader

    -------------

    CustomerNumber

    OrderNumber

    CustomerName

    CustomerAddress

    CustomerCity

    CustomerState

    CustomerPostalCode

    ...

    OrderLine

    ----------

    (leave this as an excercise for others)

    In order to understand the problem domain and the data, we took the design to 3NF. In moving to a physical design, we made decisions, usually business related, and denormalized the data to satisfy the business, legal, or other regulatory requirements.

    I hope that helps clarify what I was trying to communicate earlier.

  • sqlvogel (3/29/2012)


    Sean Lange (3/29/2012)


    Well in truth if you include the customer's current data at the time of the order in the order table you can't possibly be in 3NF because it is a direct violation of 2NF (Eliminate Redundant Data). That is why Lynn says this is a typical and usually very good type of thing to denormalize. It makes sense that you would want to have this historical information but it is no longer 3NF.

    2NF does not mean "eliminate redundant data". Even if it did, I don't think it's the issue here. 2NF means that there are no partial key dependencies on nonprime attributes. It's not obvious to me why having the customer data from the time of the order in the order table would lead to a partial key dependency on a nonprime attribute in that table. If it did then I certainly wouldn't put that data there. My point was precisely that I would want to record the customer's data from the time of the order in a normalized way (e.g. BCNF or 5NF) rather than a denormalized way.

    It is actually the very capture of this point in time data that makes it denormalized. The point I think we all made is that this is a perfectly valid time to have denormalized data. It makes sense to store this information from a point in time perspective. It is however denormalized.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (3/29/2012)


    sqlvogel (3/29/2012)


    Sean Lange (3/29/2012)


    Well in truth if you include the customer's current data at the time of the order in the order table you can't possibly be in 3NF because it is a direct violation of 2NF (Eliminate Redundant Data). That is why Lynn says this is a typical and usually very good type of thing to denormalize. It makes sense that you would want to have this historical information but it is no longer 3NF.

    2NF does not mean "eliminate redundant data". Even if it did, I don't think it's the issue here. 2NF means that there are no partial key dependencies on nonprime attributes. It's not obvious to me why having the customer data from the time of the order in the order table would lead to a partial key dependency on a nonprime attribute in that table. If it did then I certainly wouldn't put that data there. My point was precisely that I would want to record the customer's data from the time of the order in a normalized way (e.g. BCNF or 5NF) rather than a denormalized way.

    It is actually the very capture of this point in time data that makes it denormalized. The point I think we all made is that this is a perfectly valid time to have denormalized data. It makes sense to store this information from a point in time perspective. It is however denormalized.

    I'm not arguing that point, that the data is denormalized. My point is that you should normalize your data to at least 3rd normal form, then denormalize where appropriate based on requirements, performance, etc., as long as you know and understand what is going on and how you are going to handle the duplicated data.

  • Lynn Pettis (3/29/2012)


    Sean Lange (3/29/2012)


    sqlvogel (3/29/2012)


    Sean Lange (3/29/2012)


    Well in truth if you include the customer's current data at the time of the order in the order table you can't possibly be in 3NF because it is a direct violation of 2NF (Eliminate Redundant Data). That is why Lynn says this is a typical and usually very good type of thing to denormalize. It makes sense that you would want to have this historical information but it is no longer 3NF.

    2NF does not mean "eliminate redundant data". Even if it did, I don't think it's the issue here. 2NF means that there are no partial key dependencies on nonprime attributes. It's not obvious to me why having the customer data from the time of the order in the order table would lead to a partial key dependency on a nonprime attribute in that table. If it did then I certainly wouldn't put that data there. My point was precisely that I would want to record the customer's data from the time of the order in a normalized way (e.g. BCNF or 5NF) rather than a denormalized way.

    It is actually the very capture of this point in time data that makes it denormalized. The point I think we all made is that this is a perfectly valid time to have denormalized data. It makes sense to store this information from a point in time perspective. It is however denormalized.

    I'm not arguing that point, that the data is denormalized. My point is that you should normalize your data to at least 3rd normal form, then denormalize where appropriate based on requirements, performance, etc., as long as you know and understand what is going on and how you are going to handle the duplicated data.

    Lynn I was not even thinking you were arguing that point. If anything you and were on the same side all along. 🙂 It is always easier to normalize appropriately then go back and then denormalized as required.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (3/29/2012)


    Lynn Pettis (3/29/2012)


    Sean Lange (3/29/2012)


    sqlvogel (3/29/2012)


    Sean Lange (3/29/2012)


    Well in truth if you include the customer's current data at the time of the order in the order table you can't possibly be in 3NF because it is a direct violation of 2NF (Eliminate Redundant Data). That is why Lynn says this is a typical and usually very good type of thing to denormalize. It makes sense that you would want to have this historical information but it is no longer 3NF.

    2NF does not mean "eliminate redundant data". Even if it did, I don't think it's the issue here. 2NF means that there are no partial key dependencies on nonprime attributes. It's not obvious to me why having the customer data from the time of the order in the order table would lead to a partial key dependency on a nonprime attribute in that table. If it did then I certainly wouldn't put that data there. My point was precisely that I would want to record the customer's data from the time of the order in a normalized way (e.g. BCNF or 5NF) rather than a denormalized way.

    It is actually the very capture of this point in time data that makes it denormalized. The point I think we all made is that this is a perfectly valid time to have denormalized data. It makes sense to store this information from a point in time perspective. It is however denormalized.

    I'm not arguing that point, that the data is denormalized. My point is that you should normalize your data to at least 3rd normal form, then denormalize where appropriate based on requirements, performance, etc., as long as you know and understand what is going on and how you are going to handle the duplicated data.

    Lynn I was not even thinking you were arguing that point. If anything you and were on the same side all along. 🙂 It is always easier to normalize appropriately then go back and then denormalized as required.

    Okay, maybe I am getting a little bit to literal in what I was reading. Have to go find those chill pills my daughters keep telling me I need to take.

  • Sean Lange (3/29/2012)


    It is actually the very capture of this point in time data that makes it denormalized. The point I think we all made is that this is a perfectly valid time to have denormalized data. It makes sense to store this information from a point in time perspective. It is however denormalized.

    Sorry, but that's nonsense. No type of data capture requirement can possibly make the data denormalized. All data, including the point-in-time information about a customer, is capable of being modelled in a normalized fashion (say, BCNF, 5NF or potentially even 6NF if you find the need to do so).

    My point of disagreement with Lynn is that the need to record the customer's address at the time of the order is no good reason at all to denormalize. The customer data example isn't really very interesting but if you aren't familiar with modelling of temporal data then you should definitely check out some of the books and other resources here fort example: temporaldata.com. The literature on this topic is very extensive and covers plenty of examples like the one mentioned here.

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply