Slowly changing fact?

  • The company I work for track refinery capacities around the world and what to

    put this data into a data warehouse. At first glance it looks simple enough,

    every quarter they take a snapshot of each refineries capacity and that is

    the fact table and the dimensions would be owners, location etc.

    The problem I have though is that the capacity snapshot is not a fact, it is

    the best information at the time, and is subject to change, even going back

    years, when better data is available. The business has a need to see best

    current data as well as being able to know what the best current data was at

    a certain time.

    Would the best approach be to make the fact table slowly changing and then

    have validity data as dimension, or is there a better approach?

  • Hi,

    I would more information before suggesting something..

    1. What is the approx db size (in terms of no. of fact rows) in each quarter?

    2. How much historical data is required? (how many snapshots to be kept in the dw for analysis)

    Thanks,

    Vijay.

    Warm Regards,
    Neel aka Vijay.

  • Thanks Vijay,

    What it would look like is an initial load of 150k rows for the historical (30 years), then there would be a couple of rows a day of the historical data being edited, then 1,200 rows added once a new quarters snapshots are loaded.

    Regards,

    James

  • Without knowing the structure of the data at hand, I'd say just load it, edit it when necessary, and add to it whenever you need to. If necessary, you can split the information into more normalized tables. I wouldn't even worry about DW on it right now. You may never need it...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If on the other hand, you are already committed to a DW approach, then I would say that your original idea is on the right track. I would simply have two (2) date dimensions: one as the date for which the estimate is being made, and then a second that that represents the date on which the estimate was made.

    This makes the accumulation of new facts much easier and more efficient and has the added benefit of providing information for new kinds of analysis (that is, "how have the estimates changed over time?").

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply