Slowly Changing Dimension My Style

  • I'm going nuts trying to use the SCD Data Flow Component as I'd like to do things a bit differently than this transform seems to be configured to handle.

    Here is what I'm trying to do:

    1) I'd like my current records inserted with a RowExpiredDate value of 999-12-31 12:00:00 rather than NULL

    2) I'd like to add a DateKey that is generated from the RowEffectiveDate in YYYYMMDD integer format (to join to a Date dimension)

    3) I'd like to update a RowCurrentStatus to either 'Current', 'Inferred' or 'Expired' based on the flow results out of the SCD Data Flow Component

    I did see some interesting properties of the SCD Data Flow Component:

    CurrentRowWhere  set as [RowEffectiveDate] IS NOT NULL AND [RowExpiredDate] IS NULL

    And on the OLE DB Command generated after a Derrived Column coming from the SCD Data Flow Component:

    SQLCommand: UPDATE [dbo].[Dim_CostCenter] SET [RowExpiredDate] = ? WHERE [CostCenterNK] = ? AND [RowExpiredDate] IS NULL

    All told, however, I am not sure how to configure all these properties and have the SCD Data Flow Component identify changed rows, process historical changed records and manage SCD2 correctly.

    Any prompt configuration tips to accomplish these requirements is most welcomed!

     

  • SOLVED!!

    This took some exploring and lots of trial and error to figure out, so I'll share my insights:

    Post SCD wizard - using it's generated objects:

    1. Changed the SCD Data Flow Component's CurrentRowWhere property to

     [RowEffectiveDate] IS NOT NULL AND [RowCurrentStatus] = 'Current' AND [RowExpiredDate] = '9999-12-31 12:00:00'

    2. Changed the OLE DB Command after the Derived Column on the Historical Attribute Inserts Output path to

     UPDATE [dbo].[Dim_CostCenter] SET [RowExpiredDate] = ? , [RowCurrentStatus] = 'Expired'

     WHERE [CostCenterNK] = ? AND [RowCurrentStatus] = 'Current' AND [RowExpiredDate] = '9999-12-31 12:00:00'

    3. Changed the Derived Column 1 after the Union All on the New Output path to add additional derived columns:

     RowExpiredDate <add as new column> (DT_DBTIMESTAMP)("9999-12-31 12:00:00") database timestamp

     RowCurrentStatus <add as new column> "Current" string[DT_STR]

     RowChangeDateKey <add as new column> YEAR(@[System::ContainerStartTime]) * 10000 + MONTH(@[System::ContainerStartTime]) * 100 + DAY(@[System::ContainerStartTime]) four-byte signed integer [DT_I4]

    4. Changed the final Insert Destination to map newly added columns (added from Derived Column 1 component)

    If anyone has any other insights, I'd really welcome your thoughts!

     

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply