September 21, 2015 at 7:31 am
Hello
Can anybody recommend a best practice approach with the following scenario?
I have a fact table of order lines (FactPolicyTransaction)
I have a dimension with products (DimProduct)
The fact holds a Surrogate Key for DimProduct
I want to hold current and historic changes to Products (SCD 2 method)
In my Fact, I want to refer to the current structure of the product i.e. always use the latest structure in reports or cubes
I might also require history at some point
So, I'm thinking of making DimProduct SCD 2 when building using SSIS
How should I handle the Fact though as SCD 2 creates a new Product record with a new SK every tome a change is made
Do I have to keep rebuilding the Fact?
Would it be sensible to (something like) take a copy of the Product and insert this as the new record
Then overwrite the existing whilst retaining the SK
That way the SK in the Fact is consistent, always refers to the latest state and should never need to be updated
Or should I be looking to take another approach?
Thanks
- Damian
September 21, 2015 at 4:52 pm
There's a few different approaches you could take here, and it really just comes down to preference.
The one constant thing is that you will need a type 2 SCD, to track the Product dimension changes. Now in terms of how you'd like to reference both current and historic versions from the fact table is where the preference comes into play.
It would be possible to have one Product dimension (type 2), and store its key in the fact table. You should be able to implement some kind of versioning and create a self-joining view of the Product dimension, in which every version of the product is displayed alongside the most recent (or current) version. (Hope this makes sense...)
My preference is to create 2 dimensions: A Product dimension which contains only current versions, and a Product History dimension which contains all the type 2 records (including the current version). I like this approach because it eliminates the need for views (I don't like views very much), and you will now have 2 foreign keys in your fact table (one for each dimension). I also find this approach easy to explain to end users, and simplistic enough for them to understand and use... even in an ad hoc fashion.
With any type 2 SCD, the key in your fact table should reference the version of that entity (Product in this case) at the time of the transaction. You should never have to rebuild your fact table, unless you change or rebuild your type 2 dimension.
Hope this helps.
September 22, 2015 at 2:30 am
So, in this instance, create 2 versions of the product Dim (say DimProduct and DimProductHistory)
DimProduct would really be SCD1 and be referenced by my Fact
DimProductHistory would be SCD2 and used if I want history
If I do, I reference the SK for the product at that point in time
I then create 2 versions of the insert within the SSIS
Does that seem like a good approach?
Thanks
- Damian
September 22, 2015 at 6:50 am
Yes, on all accounts.
In terms of your last comment: I typically process the type 2 (history) dimension first, and then perform a merge on the current version records with the "Current" dimension.
September 22, 2015 at 7:05 am
Thanks Martin, I will give this a go
Damian.
- Damian
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply