August 17, 2007 at 3:04 am
Hi.
I'm pretty new to dw and currently having some problems with building conformed dimensions. I'm trying to conform the payment code dimension from 2 different source systems.
Below is an extract of some of the recs stored in the first source (A):
paycode | description |
10 | cheque |
11 | cash |
12 | direct |
22 | internet |
From another source system (B), the recs are as such:
paycode | description |
CH | cheque |
CA | cash |
AX | axs |
My conformed dimension on this payment code would consists of a surrogate key, paycode (natural key) and the payment description. However, seeing that the natural key from both sources are stored differently (one as numerals and one as alpha codes), what is the best way of doing this?
Do i store natural key as numerals and for payment codes which are available in B but not in A, create a new natural key for that code?
TIA!
August 17, 2007 at 10:29 am
Hi Stephanie,
I would suggest having columns "PaycodeA" and "PaycodeB" in your dimension table, and therefore storing both. This is because there isn't really a 'natural' key - particularly if you have to create one in the warehouse (as in your example of creating new natural keys for codes that are in B but not in A). If you throw away the paycodes from source B, then you won't be able to match fact rows from source B to your dimension.
What do you think?
Matt.
August 17, 2007 at 12:30 pm
I agree with Matt, especially because with the expansion in internet banking, it seems to me that neither set of codes is truly exhaustive. What if, down the road, the 'cash' option needs to be expanded into 'hard cash', 'internet cash', etc.? Management will change requirements
August 19, 2007 at 3:00 am
Ok, that means something like:
PaycodeKey | PaycodeA | PaycodeB | Description</TD |
1 | 10 | CH | cheque |
2 | 11 | CA | cash |
3 | 12 | direct | |
4 | 22 | intranet | |
5 | AX | axs |
Yes, I think that would seem better. At least would be able to match the codes from the conformed dimension back to the source system easily.
Thks a lot for the suggestion!
August 19, 2007 at 11:19 pm
Hi,
how about converting your PaycodeA to Char and then concatenate it with PaycodeB, it will give you One Natural Key. This way, you save one column.
How about it?
Sami
August 20, 2007 at 7:38 am
Stephenie,
Actually what you did makes good sence for the data warehouse. You don't necessarily want to use natural keys in the data warehouse. Conforming data means coming up with a single way to look at the same data from disparate data sources that use different codes for the same type of information; source system a: 10 cheque -- source system b: CH cheque. In the data warehouse you don't want to use both codes, as you would need to know both codes for queries to pull out all information for payments by cheque. What you came up with creates a mapping between the seperate systems to a common set of codes for the data warehouse.
October 15, 2007 at 3:37 am
Hi Stephanie,
My view is that you should always think of the end users first. End users want consistency and a single view of the customer. They don't want to query different codes for different products - it requires too much knowledge. Therefore, you should ask them which of the two types they'd rather query - why would they want both codes? I think they should be steered towards the alpha codes as these are easy to check. It's not intuitive that code 10 is cheque for example.
Then, assuming the users go for the character codes, convert the numeric codes into the same character codes in the ETL and use numeric surrogate keys as per the data warehousing norm. Hope this helps.
October 18, 2007 at 3:47 pm
I agreed with Alan. It is the 'Transformation' piece in the ETL process. It should be in the meta data table. You should ask the users what they wanted to see in the Data warehouse.
Meta data table should look something liked
Paycode VARCHAR
DWPaycode VARHCHAR
Paycode DWPaycode
10 Cheque
CK
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply