How do you Code Slowly Changing Dimensions

  • I'm curious to know what is the most popular method.

    _____________________________________________________________________________MCITP: Business Intelligence Developer (2005)

  • I've done it with all of the above, the most common was the SCD out of the box. But I have done it manually with T-SQL, and I have played with the Kimball Method SCD on Codeplex. I like that one. I want to spend more time with it because I think it will become my favorite. But always keep in mind third-party tools are not always allowed to be used by policies.

    CEWII

  • I try to avoid the built-in SCD component in SSIS for performance reasons and policies do not allow custom or 3rd party components. So I'm basically stuck with T-SQL. I've done type I and type II in T-SQL, but usually it is just type I. In 2005, I do this with a left join on the destination table on the business key, to determine updates and inserts. Inserts are inserted with the fast load option, updates are written to a temp table so that I can perform a set based update.

    In 2008, I simply use the MERGE statement.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I use Hashbytes.

    1) Cast any non-string data to string values

    2) Use Hashbytes function(create in either a VB script or extract query) and create a HashByteType1 value stored in a HashByteType1 column by concatenating all of the Type1 columns together

    3) Use Hashbytes function(create in either a VB script or extract query) and create a HashByteType2 value stored in a HashByteType2 column by concatenating all of the Type2 columns together

    4) Lookup on Business Key(s) to determine inserts or updates...return your SourceHashbyteType1 and SourceHashbyteType2 values

    5) If it's an insert then store the row with both HashByteType1 as SourceHashbyteType1 and HashbyteType2 as SourceHashbyteType2 values in your table

    6) If it's an update then compare your HashbyteType1 value with your SourceHashbyteType1 value and your HashbyteType2 value with your SourceHashbyteType2 value and handle accordingly (Set update for Type1, Multicast to be inserted in dimension and expire old records for Type2's)

    Word of warning. I'd recommend concatenating your columns with a pipe in between since a Concatenated Hashbyte value of 0 and 1 will be the same as 1 and 0 without the pipes.

    Also, the VB script component must be used if your source is not SQL2K5 or higher.

    The SCD component really shouldn't be considered unless the anticipated data sets are going to be small.

  • Apart from being limited to Type 1,2 and 3 The SSIS SCD component does not support Type 4 and 6...which I believe If the need arises The BI developer has no option but T-SQL.

    Raunak J

  • This was removed by the editor as SPAM

  • Apart from being limited to Type 1,2 and 3 The SSIS SCD component does not support Type 4 and 6...which I believe If the need arises The BI developer has no option but T-SQL.

    Based on the breakout of the SCDs in Wiki, the only addition in 6 from 2 is the use of the current flag. In this case use a boolean field with a default of yes and an update on the end date column to set the value to no. Then you are still able to use the component in SSIS.

    I'm not necessarily sold on the use of the inherent component based on what I have read here, but didn't want it ruled out unnecessarily.

  • well, for now I'm going to use the Kimball SCD as amongst other things it doesn't destroy the data flow when I need to make a change.

    Also it has a couple cool features, such as auditing that make it really useful, and I'm from free from any 3rd party policies - so for me this is the winner.

    _____________________________________________________________________________MCITP: Business Intelligence Developer (2005)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply