December 8, 2011 at 5:00 am
Hi Guys,
Am just lost in coming up with an idea of using SCD type 2 on table structure below.
create table TablePanelists
(
pDate date
,PType varchar(10)
,email varchar(255)
)
Note
- pDate and PType are the table identifiers.
- For every pDate and PType, there are at least 3 occurrences of email.
select '10/01/2011' as pDate, 'A' as PType, 'le@abc.com' email
union
select '10/01/2011' as pDate, 'A' as PType, 'ka@abc.com' email
union
select '10/01/2011' as pDate, 'A' as PType, 'pi@abc.com' email
- My question is how can I place SCD type 2 on email column?
- My approach of updating the table is truncate and load, what other best approach I can adopt to keep historical data.
December 8, 2011 at 10:01 am
First you need to identify the business key, the column or columns that make the data unique in your source system. You say pDate and PType are identifiers but they are not unique. I’m not sure what the sample data means so I can’t guess at a possible business key.
The remaining non-business key columns from your source system will be the changing attributes in your dimension.
You will want to add two more things to your dimension. The first is a surrogate key. This is usually an auto number identity column that has no meaning or relationship to the business key. For a SCD2 you will also want start and end date columns to track when changes happened.
As far as ETL you won’t want to truncate and reload. Instead you need to bring in the rows from your source system and compare them to the dimension based on the business key (BK). There are three basic actions to take:
1.If there is a BK in the source system that is not in the dimension, insert that row
2.If there is a BK in the source system and in the dimension and the changing attribute columns have changed, expire the existing row and insert a new row
3.If there is a BK in the source system and in the dimension and the changing attribute columns are the same, do nothing
Kimball’s Data Warehouse Toolkit is an excellent reference.
http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247
December 8, 2011 at 12:40 pm
Olalekan Adewale (12/8/2011)
- My question is how can I place SCD type 2 on email column?- My approach of updating the table is truncate and load, what other best approach I can adopt to keep historical data.
For starters you do not place SCD on a particular column but on a particular table to track changes on the particular table rows.
In general you accomplish Type 2 SCD by adding three column to your table, these columns are:
- ValidSince date
- ValidUntil date
- CurrentValue flag
As you can see the standard implementation of Type 2 SCD colides heads on with an ETL strategy based on truncate and load - any chances of implementing a "merge" based strategy?
Note: Please note that in this particular case "merge" doesn't actually mean merge but an ETL process that updates the last current row by setting ValidUntil to getdate() and CurrentValue to Null while inserting the new row.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.December 9, 2011 at 12:06 am
PaulB-TheOneAndOnly (12/8/2011)
Olalekan Adewale (12/8/2011)
- My question is how can I place SCD type 2 on email column?- My approach of updating the table is truncate and load, what other best approach I can adopt to keep historical data.
For starters you do not place SCD on a particular column but on a particular table to track changes on the particular table rows.
You don't? You can have columns that maintain history with SCD2, but you can have other columns in the same table that don't keep history and are thus updated with SCD1.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
December 9, 2011 at 12:10 am
Eric Klovning (12/8/2011)
First you need to identify the business key, the column or columns that make the data unique in your source system. You say pDate and PType are identifiers but they are not unique. I’m not sure what the sample data means so I can’t guess at a possible business key.
Is the BK unique in a SCD2 table?
Say that CustomerID is the business key. And a customer changes address.
Then you get something like this:
SELECT 1 AS SK_Customer,'12345' AS CustomerID, 'M' AS Gender, 'Koen Verbeeck' AS FullName, 'MyStreet 1' AS Address, '2008-01-01' AS ValidFrom, '2011-11-31' AS ValidTo, 0 AS CurrentValue
UNION
SELECT 2 AS SK_Customer,'12345' AS CustomerID, 'M' AS Gender, 'Koen Verbeeck' AS FullName, 'MyStreet 999' AS Address, '2011-12-01' AS ValidFrom, NULL AS ValidTo, 1 AS CurrentValue
So the surrogate key changes for each row (and thus for each version), but the business key remains the same.
edit: aargh, why is there a scrollbar in the code window?
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
December 9, 2011 at 3:32 am
Thanks all,
Your contributions has re-sharpen how I see SCD type 2. Am only trying to base my scd type 2 on 3 column table where two of the columns are row identifier, while intend using the last as scd. I now see it's not possible to scd on just one column. Will go back to my drawing board, will device a method of update and insert.
Many thanks
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply