June 9, 2014 at 10:16 am
Hi All,
I have been working over the last couple weeks with trying to implement a Partitioned View for my DW. (Just to mention, I had to go this route because my version of SQL Server does not allow for partitioning tables.)
My problem came to me this morning as I was working on the base tables of the partitioned view. I have 9 tables that are broken out for the Partitioned View and I have a nightly ETL process that updates only the most current base table. As I was testing a BO report that is already in prod against this new partitioned view, I realized that the results that came back were perfect for the current year (tables are broken up by years) but the previous years either didn't return records or the data was jumbled. Then it hit me. I have surrogate keys in my DW! When I rerun the ETL to rebuild the most current year table, all of the keys are up to date. Since I do not rerun the other base tables, they have old keys.
Does anyone have a suggestion of how to work with an issue like this? Are there any articles/forums around using Partitioned View with surrogate keys?
Thanks in advance.
June 9, 2014 at 11:56 am
To ensure I fully understand - you're completely tearing down your dimensions, rebuilding them (hence the new keys), on a daily/load basis?
If so, not much you can do other than update all of the other base tables. Alternatively, don't tear down the dimensions. Easy said than done, I know, as I've worked in places where we had to do that too - but, we were basically forced to rebuild large parts (read as "nearly all") of the DW on a daily basis. The best we got to, we not tearing it down daily, but doing a full tear-down and refresh weekly.
Steve.
June 9, 2014 at 12:09 pm
Unfortunately, YES. I am having to rebuild the dimensions every night because the data in them could change any day. If I cannot do anything given my situation other than rebuild everything, are there any other suggestions on what else I can do? As of now, rebuilding the dims/fact every night takes roughly 3 1/2 - 4 hours to do. With the partitioning I was getting down to 30 mins.
June 9, 2014 at 12:22 pm
When you say the data in them could change any day, this is true of many slowly changing dimensions, but we don't rebuild these daily. Typically the key value you've identified (eg/ Customer Name) remains constant, while attributes related to this may change (address, city, zip etc). This doesn't require a full tear down of the dimension.
I'm assuming the above isn't your case, but just on the off chance it is, have you considered that you have identified the correct unique identifier ('key') for your dimension/s?
As mentioned, I've seen a few cases where the dim values truly did change, markedly, over time (e.g. delivery status) but to be truthful, I still think that we hadn't approached the design correctly, and could have managed these status changes differently.
Steve.
June 9, 2014 at 12:37 pm
Well that is a good point you made. I suppose that these dims do NOT need to be broken down and rebuilt every night. For example, I have a Student Dim and this could possibly have a new record every day, but more often it's probably more realistic to think every once in a while during the school year and certainly at the beginning of a new year.
Let me ask you this. If I do not rebuild the dims every night, then I can continue to just rebuild the current base table of the partitioned view and all would match. Now when it comes time to rebuild my dims (ex. a new student is entered), what do I do with the other base tables from the previous school years? None of the keys will match there. I'm assuming I would need to rebuild all of them so they will contain the updated keys, right?
June 9, 2014 at 12:44 pm
I'm sort of not going to answer your question directly, but hope that you might get what you want indirectly 🙂
If you're working with say Student, Course, Room etc - then *to me* none of your dimensions would (likely) ever require a full tear down. New records are either identified by the source system, or, you manage this (via Lookups perhaps in SSIS, or simply left outer joins if you're using TSQL) and the same approach is used for records that have been modified.
Steve.
June 9, 2014 at 12:51 pm
Let me ask you this. If I was to never tear down any dim and instead use incremental loading in my ETL process to insert/update/delete, could that be a way to avoid having to alter the existing surrogate keys? New records would just have new keys and previously existing keys would always remain?
June 9, 2014 at 1:40 pm
Yes (but, being pedantic, there wouldn't be a 'delete' per se on the dimensions. you might flag a record but deleting isn't overly fun, then you need to del from your fact/s too).
Steve.
June 9, 2014 at 1:44 pm
Great. I am going to start working on the dev for this. Appreciate the help.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply