Fact Table Update (Design)

  • Niggles wrote:

    What I'm trying to ascertain here is if it's feasible to make the whole model temporal (i.e. all Fact tables & Dimensions) so it can be rolled back in unison within a view. The other option would be a hybrid approach by keeping the dimensions as is (i.e. non-temporal & loaded using a hybrid type1 & type2 approach). With the aforementioned hybrid approach the Fact table could still be wound back (temporal) and referential integrity maintained with the underling dimensions. I'm leaning towards the "All temporal" approach but I feel uneasy about it. Any thoughts?

    With the use of MS Temporal Tables, I don't see why a full temporal approach would create any significant issues.  It's all virtually auto-magic and the difference between the base and history tables is covered up by a system generated view that basically takes care of the who shebang for each table.  The "current" data is always in the table that you would refer to and the history data has been moved to a different table so that it doesn't reduce the "active row count" per page, which helps keep your non-temporal queries nasty fast compared to having the history stored in the "base table".

    Super wide tables will always be an issue for the "full row" type of auditing that MS Temporal Tables or the equivalent "roll your own" solutions would have.

    As a bit of a sidebar, Martin is correct in that you're using terms like Type-1 and Type-2 SCDs incorrectly BUT, I know what you mean.  What you mean isn't about the type of SCD... what you're talking about is you're using those terms as short-cuts to explain the actions taken on the base and history tables that underlie MS Temporal Tables.  In fact, Type 6 SCDs are named that because they either the combination of Type 2 and 4 or, less common, Types 1, 2, and 3.  So I do understand what you mean and don't have to change that on my account.

    With that in mind, I'm using the following as a reference for SCDs.

    https://en.wikipedia.org/wiki/Slowly_changing_dimension

    To me, MS Temporal Tables are "Pure Type 6" SCDs and my "Roll your own" SCDs have a touch of Type 7 Hybrid (depending on if there are any AK's in the base table) without than damned Current_Flag column.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just thought for the sake of helping anyone reading through these posts that I'd include the following link which is relevant to the discussion.

    Quote "you can forego the “Is this a Type 1 or Type 2 SCD?” debate when using temporal tables. Technically every change in a temporal table is stored like a Type 2, but for queries on attributes where history is not important, a simple SELECT against the table (without the FOR SYSTEM TIME qualifier) works just like a Type 1 SCD and returns just the current version of that row. You get Type 1 simplicity with Type 2 flexibility".

    Note: The final comment in that link also gives a very strong counter argument against using temporal tables for dimensions

    https://www.timmitchell.net/post/2019/04/02/using-temporal-tables-for-slowly-changing-dimensions/#:~:text=To%20use%20a%20temporal%20table%20as%20a%20slowly,just%20like%20you%20would%20on%20any%20non-versioned%20table.

     

    • This reply was modified 2 years, 8 months ago by  Niggles.
    • This reply was modified 2 years, 8 months ago by  Niggles.
  • Hi Jeff,

    This gives me more confidence that the approach is at least viable and I'm not losing my mind :). Thanks a mil for offering that.

    Apologies for the confusion in relation to relaying my thoughts on the whole process, I think it's a bit tricky to vocalise. I think I'll implement it as it quite straight forward and then let it run and test it as proof of concept. I'll post my experience with it here in any case just in case it helps anyone out.

    Thanks,

    N

  • Niggles wrote:

    Apologies for the confusion in relation to relaying my thoughts on the whole process, I think it's a bit tricky to vocalise. I think I'll implement it as it quite straight forward and then let it run and test it as proof of concept. I'll post my experience with it here in any case just in case it helps anyone out.

    I'm thinking that you're thinking out load and that's a good thing.  And, yes... setting up PoP code is absolutely the way to go.  And I REALLY appreciate you sharing to help others out.  It's not often that one has a "ground floor" design opportunity like this.  I've done a small group of PiT tables based on some recent customer requests but not quite to the extent that you have in mind.

    My recommendation would be to keep good notes and copies of your code for test setups, etc and, when you're done, consider writing a series of articles on your grand adventure.  I think that it would really help a whole lot of people out especially when it comes to the ease of PiT queries that all of this will make possible.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi,

    Just to be inclusive I am including a few points that I came across in my research that I'm considering areas of concern in substituting a Dimension Scd2 load approach with a temporal table (In relation to Scd2 I'm referring to historizing a row in the dimension and creating a new version with new surrogate key if a column changes);

    1. SCD's exist to report on the dimension at the point in time the fact occurred. With a temporal table we'd have the active row in the dimension. I think we wouldn't be joining any more on the surrogate key between the fact and dimension. I "think" it would require a function such as below and an "Outer Apply" as shown (I don't believe the For System Time As Of syntax allows passing in a column value in a join directly). If I'm correct with the below I'm not sure it will scale very well. On second thoughts I'm thinking that maybe a join to the temporal table using the "For System Time Contained In" or "For System Time Between" syntax would expose the previously active records in the dimension for each business key and then we could still join on the surrogate key. I haven't been able to find anything online so far so the support here has been great.

    /*************************************************************************************/

    Create Function dbo.fn_Dim_Patient (@Date datetime(7))

    Returns Table

    As Return

    Select Col1, Col2..etc

    From dbo.Dim_Patient For System_Time As Of @Date

    /*************************************************************************************/

    Outer Apply:

    Select t1.ContactDate, t1.ContactID, t2.PatientEthnicity etc. etc.

    From dbo.Fact As t1

    Outer Apply dbo.fn_Dim_Patient (FactDate) As t2

    2) If you wanted dimension attributes as a Type1 change to keep the record count down, temporal tables will still create a record in the history table for the update. It could grow quite large depending on the amount of updates coming through

    Thanks,

    N

     

  • Ok, now that I'm not confused anymore 🙂

    I am familiar with Tim's post, and wish that Microsoft took the temporal table feature a little further past its initial implementation. My personal preference for type-2 dimensions is not to use this feature, for the following reasons:

    • I like to have both an effective and termination date to simplify date-range queries, and with this feature you only get one.
    • I don't like that the data type is datetime. I prefer a date only as that is what I typically use in my Date dimension. A different data type would mean converting the values every time.
    • Control. I rarely track changes to every single attribute in my type-2 dimensions, but you don't have the ability with this feature to be that specific. Every single change will be tracked.

    My opinion: This feature is well-suited if you're really only looking for auditing, but it is not a replacement for the dimensional modelling techniques we've been using up to now with type-2 dimensions. The goal with dimension modeling is to simplify the data model for subsequent queries, and this feature doesn't realize that goal for me...at least not with its current implementation in SQL Server.

  • Hi Martin,

    Cheers for posting your take on it. After discussing at length internally and taking your points (& Jeffs) into consideration I'm of the opinion that making the underlying dimensions temporal is adding a layer of un-necessary complexity and it might not scale well. Making the fact temporal though sounds like a big win for me. I'm setting that up this evening and I'm testing the rollback (i.e. reverting the fact table back to the previous day / week using the For System Time As Of). I'll post any "gotcha's" here anyway 🙂 hopefully there aren't any.

    Thanks again,

    N

    • This reply was modified 2 years, 8 months ago by  Niggles.

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply