October 5, 2009 at 6:53 am
The company I work for at the moment uses a (self named) data warehouse that has several flaws ...
To address these, a new project is started but I have serious doubts about the basic assumptions and I'd like some opinion...
First, to give an idea of the situation
- this "Data Warehouse" does not include any time dimension and has therefore no concept of "slowly changing dimension"
- Dimension tables use business keys as primary keys (no surrogate key)
We had an inception meeting where the team lead exposed the base ideas and one of them strike me as probably wrong.
Because of the front-end tool they use, our end users end up creating some reports that include MDX using member values that may be ambiguous, like [Paris] without specifying whether it's [Paris] in [Texas] or [Paris] in [France].
The "solution" prescribed is to use (alphanumeric) business keys as Member Keys in our dimensions/cubes and ask our end user to refer to the member key in their MDX: &[PAR1] for Paris (France).
I have read several books from Kimball but not practiced much and this strikes me as being seriously flawed because
- as a matter of principal, I would use surrogate keys in my dimensions (no business key)
- for the leaf level of my dimension/hierarchy, I would use the surrogate key for member key and would not dream of asking users to use this
My question is therefore
"Am I right in thinking that the Member Key at the leaf level of my Dimension/Hierarchy should be my surrogate key and cannot be any sort of business key?"
Many thanks for reading so far! :doze:
October 6, 2009 at 5:43 am
Hi Eric,
- this "Data Warehouse" does not include any time dimension and has therefore no concept of "slowly changing dimension"
If what you're saying is that the proposed "solution" doesn't capture any datetime attributes at all, then I would definately be raising this with the team lead. The datamart will not be of much business use if there's no way of querying when events took place.
If you're saying however, that because there is no separate date dimension in your datamart, an SCD can't be implemented, then that isn't strictly correct.
In some dimensional model designs, what happens is that a special DateKey and TimeKey are used leaving you with a type of "degenerate" dimension (key is only attribute so stored with associated facts and dimensions). This is possible because the key itself is an integer representation of the date. Having said this, I'd argue that a well designed datamart should have both a time and a date dimension given that they are an integral part of practically any user query. They are likely to want to query events in Q1 or this week last year and they will want easily readable results returned rather than a key. And whats more because they are computed they are very easy to implement.
"Am I right in thinking that the Member Key at the leaf level of my Dimension/Hierarchy should be my surrogate key and cannot be any sort of business key?"
Whilst it's a good idea for the business key to be one of the dimension attributes, it shouldn't be the logical key for the dimension. There are a few reasons for this which you have probably read about. 1) If you want to capture history with a Type 2 SCD then the need to use a surrogate key will become apparent! 2) Surrogate keys tend to perform better as they will generally be sequentially numbered integers (as opposed to your business key example "PAR1") 3) They protect the datamart from changes in the source system.
The example you have pointed out is a classic reason why alot of users will expect the business key to remain in the dimension. If they knew their way around source systems inside out and are familiar with invoice ranges, model numbers etc, then they'll want to be able to query for these.
HTH
Kindest Regards,
Frank Bazan
October 6, 2009 at 5:50 am
Frank,
Thanks for your answer, it confirms my impressions.
I know about degenerate dimensions but I don't think there is one in this datamart/datawarehouse regarding dates.
It's wonderful, there is lots of scope for improvements!
😉
Cheers
Eric
October 7, 2009 at 6:33 am
All dimensions should have surrogate integer keys as the primary
key. The business key is also left in and usually named like
"CustomerNumberBKey" so users can see it easier.
As the previous post said, you need surrogates to implement
SCDs, and you need them also to associate the appropriate
dimension record to a fact table as a foreign key.
If your data warehouse has a middle relational layer (mine does,
I fought for it, hard). Then all you need is to reengineer your
dimension and fact ETL processes.
October 8, 2009 at 5:41 am
Just to add, the degenerate key is generally the business key, such as an OrderId, that does not fit into a dimension and where creating a single dimension would make no sense for a single column. When adding date keys to the fact table and are using 20091007 to represent 10/07/2009, this is generally referred to as a smart key. This should still point back to a date dimension though. To force the users to have to interpret the data is generally considered bad design.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply