Related data within the same facts

  • How do you store facts within which data is related? And how do you configure the measure? For example, I have a data warehouse that tracks the lifecycle of an order, which changes states - ordered, to shipped, to refunded. And for a state like 'refunded', it is not always there. So in my model, I am employing the transaction store model, so every time the order changes state, it is another row in the fact table. So, for an order that was placed in april, and refunded in may, there will be two rows - one with a state of 'ordered' and another with a state of 'refunded'. So if the user wanted to see all the orders placed/ordered in april, and wanted to see how many of 'those' orders got refunded, how would he see that? Is this a MDX query that will be run at runtime? Is this is a calculated measure I can store in the cube? How would I do that? My thought process is that it should be a fact that the user can use in a pivottable, but I'm not sure.....

  • Based on what you've provided, I would suggest that you would use an accumulating snapshot fact table. The row is revisited when there is a change. The record has a series of dates which are updated as the record is updated. There is a creation date and whatever other dates you might need. For example, in one of my more complex ones, I have creation date, first contact date, shipped date, completed date, cancelled date, and closed date. I have included the last three because no single work order will have all three. At first they start out 'N/A'. When the order is completed, the complete and closed are set to the same date. If the order is cancelled the cancelled date and closed dates are set. The untouched date remains 'N/A'. In the event an order is completed or cancelled w/o a date, the value is 'UNK'.

    Although Ralph Kimball says that the accumulating snapshot fact table is the least common of the three he identifies, it has been the most common for me.

  • I agree with Ron regarding the accumulating snapshot fact table. It is the correct way to go about designing this fact table. Don't be afraid to update fact table data, even though there is often a mindset that facts shouldn't be updated. It's often the only way to get measures to report correctly without a lot of flag setting and filtering, which is prone to error.

    In addition, if you do need to store a copy of your order transaction in every state it was in, do this in a different fact table. This fact table would be more of an audit trail and would not be additive so you wouldn't use it to report sales figures, but it would serve it's own purposes. Anyone wanting to see every state that order was in and the transitions it took would have visibility into that.

    LinkedIn - http://www.linkedin.com/in/carlosbossy
    Blog - http://www.carlosbossy.com
    Follow me - @carlosbossy

  • MR-216760 (4/23/2011)


    How do you store facts within which data is related? And how do you configure the measure? For example, I have a data warehouse that tracks the lifecycle of an order, which changes states - ordered, to shipped, to refunded. And for a state like 'refunded', it is not always there. So in my model, I am employing the transaction store model, so every time the order changes state, it is another row in the fact table. So, for an order that was placed in april, and refunded in may, there will be two rows - one with a state of 'ordered' and another with a state of 'refunded'. So if the user wanted to see all the orders placed/ordered in april, and wanted to see how many of 'those' orders got refunded, how would he see that? Is this a MDX query that will be run at runtime? Is this is a calculated measure I can store in the cube? How would I do that? My thought process is that it should be a fact that the user can use in a pivottable, but I'm not sure.....

    Allow me to suggest a different approach.

    First, let's diferenciate between core tables a.k.a. FACT/DIM and what I call "delivery layer" ones which include but are not limited to aggregation tables, summary tables, materialized views, cubes, etc.

    The initial issue is how to store data into the core tables. FACT tables are supposed to store information about Business Events and States; following that simple definition the core FACT table in this particular case has to include a row for each business-event/state... in the particular scenario of "ordered" then "refunded" two rows are needed to track such events/states in the target FACT_ORDER table.

    Second issue is how to deliver that information to the user. The easiest way to do it in a consistent way would be to create a view on top of FACT_ORDER let's call it V_FACT_ORDER_HISTORY; the logic for this view is pretty easy, just show all states of the order, ordered by date of the event.

    Bottom line is, FACT tables should reflect all Business Events and States.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply