Normalize or denormalize?

  • Hi folks,

    I've been questioning myself lately about how I should build the new database for the sales system.

    In the old system (Cobol system), we save every field "as is" to be able to reprint the invoices with 7 years backwards.

    i.e. if we do a sale for a certain customer, from a certain manufacturer the table looks like this:

    InvoiceDetails

    InvoiceDetailID int identity(1,1),

    MfrCode varchar(5) not null,

    CustomerNum varchar(5) not null,

    SalesAmount money

    (This is only a sample to illustrate the facts I'm questioning myself on)

    Now, as a young DBA trying to "Change the way it works" or "Change the world", I've been trying to create a nicely normalized database with all that data for the new system, (it looks like an awesome opportunity to rewrite the DB in the new format). But, when trying to do so, I came across the fact that I cannot save the IDs in the sales table... Since they want to be able to reprint the exact invoice, I MUST save the data as is in the table, and therefore denormalize my data.

    i.e. in my opinion the table should look as follow.

    InvoiceDetails

    InvoiceDetailID int identity(1,1)

    MfrID int not null, -- with a reference (FK) to the Manufacturer table

    CustomerID int not null, -- with a reference (FK) to the Customer table

    SalesAmount money

    Now, I'm asking, what would you guys propose for that kind of "problem" (Seems like a real life problem a lot more then a problem by itself).

    Is there a way to keep the data normalized (with IDs and FKs), or should i go the "old cobol way" and just stop trying to change the world? 😎

    A few colleagues tried to tell me I should normalize the data, and ALSO copy the fields within the table,

    Like that:

    InvoiceDetails

    InvoiceDetailID int identity(1,1)

    MfrID int not null, -- with a reference (FK) to the Manufacturer table

    MfrCode varchar(5) not null, -- Copy of when the invoice was made

    CustomerID int not null, -- with a reference (FK) to the Customer table

    CustomerNum varchar(5) not null, -- Copy of when the invoice was made

    SalesAmount money

    But I do not really like this approach, since it can lead to errors, or we might ask ourselves where to query the data, since it can be duplicated, and then modified in the Manufacturer table, in example.

    I really cannot come to a solution on this, I might be missing something stupid, but I'm really in need of a good discussion of pros and cons on this subject. Hope I get a few gurus to leap on this one! 😉

    Thanks in advance,

    Cheers,

    J-F

  • There ARE situations where denormalizing data is prudent. Data warehouses are one, and in my opinon, this is the other. The "CompletedSales" table, containing sales that are completed and wil not be changed, should contain denormalized data. There is good reason for this because of things that will change over time. For instance the price of an item that was sold today will probably cost more next year, and if you want to have an accurate representation of the old invoice you must have the price saved for the date when the item was sold. I call this type of table a "snapshot" table, which is necessarily denormalized to keep all aspects of the data intact as of the time that record was created. In some cases this is required for auditing anyway. Imagine if an audit found all of your sales records did not synch with accouting's numbers because the prices of many items on those old sales records had gone up and your DB now reported old sales records as totaling more than the accounting department had collected. That would be a huge problem.

    Now, if you have data elements on a sales record that you want to be dynamic, to change on all sales records if a change is required, that's when the normalized data would be handy. As far as I am concerned, I cannot think of anything on a COMPLETED sales record I would like to dynamically change ever.

    Chris

    Learning something new on every visit to SSC. Hoping to pass it on to someone else.

  • J-F,

    If this is an OLTP system, I would go with your proposal with the FK in the InvoiceDetail table. The only thing that you would need to ensure that the Mfr and Customer data isn't overwritten in the parent table when Mfr and Customers change.

    As far as flagging an Invoice as completed, you could have a bit column in the table which can be updated as necessary.

    If this were for a DW, I would then go with Chris' suggestion of denormalizing the data to create tblCompletedSales.

    SA

  • Even in an OLTP system there is a point at which you stop normalizing the data and start denormalizing it. There are trade offs that are made during this process. You are looking at retaining data over a period of 7 years, but also look at the current business needs as well. Do you have users that need to look at this information on a regular basis, such as Customer Service? If they need to look at a 3 month old Sales Order or Invoice for a customer, they need to see the information as it was at the time it was created, not based on current information in the database (price changes, address changes, etc).

    In many cases, the denormalized data is for historical, point in time purposes and will not be something that anyone should be able to edit.

  • I am going to side mostly with you colleagues who said "both". Being able to exactly reproduce a historical document is often a business requirement (and sometimes a legal one also). Yoo cannot ignore that, so you have to preserve that capability in your redesign.

    That will raise the question of what can you get from your redesign, if you still have to keep that old data? The obvious answer is "Relationships". So what you want to add are the key/id fields that allow you to relate these records to the other tables and each other, like CustomerID.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Keep in mind as well that conceptually, higher normalization = greater flexibility and lower normalization = less flexibility. So as part of your redesign you need to consider (specifically around this history part of the requirement) what thing(s) you need to be able to do with the old data.

    If you need to do a LOT of different operations to the "old" data, then that would tend to imply maintaining a lot of the normalization. If the invoice is the only function you need to keep "as is", then you may be able to get away with NO normalization at all, or you may be able to get away with a non-database solution (why maintain something in the DB if the IMAGE/PDF of the invoice will satisfy the requirement). I suspect your solution will end up somewhere in the middle.

    Again- all of this involves right-sizing your solution to the business challenge you want to solve.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • It does come down to business requirements. If you need to be able to regenerate any reports, then images of invoices won't be of benefit.

  • I'll jump on the bandwagon here. When designing a database, I first normalize and see where that gets us. When requirements or performance demand denormalization, there isn't anything wrong with that. We are paid to ensure the application can carry out what the business needs to accomplish, not to create ideal designs.

    You did mention denormalizing and duplicating data. This is an absolute last resort, IMO. I hate duplicating data other than keys in multiple tables. Inevitably it will end up out of synch. Yes, there are times when that rule should be broken. But that should be much more rare than even denormalization.

  • jparra (3/25/2009)


    I'll jump on the bandwagon here. When designing a database, I first normalize and see where that gets us. When requirements or performance demand denormalization, there isn't anything wrong with that. We are paid to ensure the application can carry out what the business needs to accomplish, not to create ideal designs.

    You did mention denormalizing and duplicating data. This is an absolute last resort, IMO. I hate duplicating data other than keys in multiple tables. Inevitably it will end up out of synch. Yes, there are times when that rule should be broken. But that should be much more rare than even denormalization.

    "It Depends" what you consider duplicate data. Is the price of a Product duplicate data on Sales Order or Invoice if the Price is also stored in the Inventory record for the Product? What if the price changes in inventory, do you want that price being reflected in the Sales Order or Invoice created yesterday or three months ago?

  • "It Depends" what you consider duplicate data. Is the price of a Product duplicate data on Sales Order or Invoice if the Price is also stored in the Inventory record for the Product? What if the price changes in inventory, do you want that price being reflected in the Sales Order or Invoice created yesterday or three months ago?

    Well, I was speaking in general terms not specifically to this example. In the situation you discuss, I don't consider that duplicate data. What you have are two different and distinct pieces of data in your example. There is "Current Price", which would be attached to a particular item in inventory. Then there is the "Sales Price" attached to the Invoice Detail, which was the price at the time of sale.

    Carrying a "Total Sales Price" in the Invoice Header record brings up a whole different normalization discussion, of course...:rolleyes:

  • Good response. I was just trying to make sure that everyone had a good idea as to what is "duplicate" data and that which isn't. In a strictly normalized system it would be, regarding the price of a product.

    Hoping this helps with your normalization/denormalization process.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply