June 6, 2017 at 1:57 pm
Hi, we are planning a data warehouse that will be loaded three times a year. This is a single-purpose warehouse for which we currently have no expansion plans. We are using the Kimball methodology.
We are categorizing the columns in our dimension tables into type 1 and type 2. One dimension table has a fairly long list of columns, a mix of type 1 and type 2, and only about 1,000 rows. Rather than updating the rows that have only type 1 changes, and writing new rows when type 2 changes occur, would it be appropriate to simply treat all rows as if they had type 2 changes over the last few months? In other words, with every three-times-a-year data warehouse load, all current rows for this dimension would be given an end effective date and current flag set to "No", and all new rows would be loaded.
If this is not a good idea, can you tell me why? Thanks for your help!
June 6, 2017 at 2:17 pm
Before answering your question, my own would be why you would want to drive type-2 changes for every record. Are you trying to save 20 lines of code and the effort that goes along with it, or is there some other reason?
Always think about the future when developing these kinds of systems and processes. A 1,000 record table will not stay a 1,000 record table...and in future it may be decided that it should be loaded more frequently. If you go ahead with the notion of driving type-2 changes for everything in every load, these potential (even though unlikely) future scenarios would cause problems.
Also think about other people working on this system in the future, when you are no longer there. Will it make sense to them that this is happening, or would it possibly create confusion?
There isn't necessarily a right or wrong answer here. In my opinion it is a case of "is there a good enough reason to not follow standard protocol".
June 6, 2017 at 2:25 pm
Yes, I think the motivation is to save lines of code. (Its not actually my idea.) The dimension has over 20 columns, most of which are type 2, so it could save a fair amount of code.
Thanks for the input.
June 6, 2017 at 2:31 pm
Nicole Garris - Tuesday, June 6, 2017 2:25 PMYes, I think the motivation is to save lines of code. (Its not actually my idea.) The dimension has over 20 columns, most of which are type 2, so it could save a fair amount of code.Thanks for the input.
Not a good enough reason in my opinion.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply