October 10, 2014 at 7:50 am
Hi
My first DW project is still in the development stage. My etl package refreshes all the dimension every time it runs. retiring old rows, inserting new rows or updating current rows where type 1 or 2 dictates. all good.
I then start the processing for my fact tables. My fact etl currently truncates the fact table pulls the new dataset then dose all the lockups for the dim surrogate keys and then populates the fact table. Im understanding that the fact tables should not really be truncated and re populated but rather loaded with historical alld marked inactive.
I load source data in staging dimension which are truncated first.
I then etl from these staging dimensions to build my Dimension tables.
I then pull fact data from my source and etl to my fact table(should i have a staging table for this too?)
My question is what is the best way to do this? should i be retiring rows in my fact table in a similar manner to how i retire rows in my dimensions or is the a special way to do this (note, i dont use the merge statement should i)
thanks
November 14, 2014 at 5:09 am
That is a huge IT DEPENDS!
If you can identify a reliable delta of changed records from the source then I would look at an incremental patters (don't forget about deleted records)
If you can't then you have no choice but to do a record by record compare between the incoming data and the existing DW data. If your datasets are large then you should look at using MERGE OUTPUT as the database engine is way better at this RBAR processing than SSIS.
If you have a high volume of changes you may want to look at dropping indexes prior to update and then rebuilding them.
Truncating the whole table may mean losing historic Type II changes
You may need more than one staging zone: I have seen systems with 4! Source Extract, Data Cleanse, PreStage (3NF), Staging (Type I Facts and Dims), Insert/update DW
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply