October 31, 2012 at 4:29 am
Hi everybody:
Right now I´m working in a BI solution and I received a set of tables with a multidimensional structure. I have a Fact table with transaction from customers and several dimensional tables. One of them is the KUNDE or customer table, in which the typical customer information is stored, like customer key, a key to another table (contact details), customer since date, etc. I also found numeric values in this table like "amount of purchases last 12 Month" or "Total shopping days" inside this table. That makes me wonder if it makes sense or not to store this measures inside the table, moreover, I'm building another table to store the customer data as a type II SCD and I´m not able to persuade the analyst to not include measures in this table.
What is your opinion about that?
Any comment would be appreciated.
Kind Regards,
October 31, 2012 at 8:14 am
I wouldn't store those in a dimension table. These are measures (albeit calculated), not attributes. These measures are facts that you should be able to calculate from the fact table (at the top of my head, the two examples you gave are very easy to calculate with TSQL).
Furthermore, these measures are very prone to change, so that would mean that you would have to update your customer dimension too often.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 31, 2012 at 8:41 am
Thanks for your quick answer. That's exactly what I thought. I think aCRM guys are a little bit lazy and don't want to make joins. Anyway, there are a couple of values that we received from an external provider inside the Customer table that I cannot change. What I do is to create new tables and store aggregated measures in a new fact table and create also a new Customer table with the appropriate fields. Now I'll try the SCD Transfomation (SSIS 2008) that lets me treat some fields as type 1 and others as type 2.
Kind Regards,
November 1, 2012 at 1:23 am
Hi Paul,
I think that storing the value " amount of purchases last 12 Month" in the dimension table is valid.Let me explain one scenario where it would be usefull.Suppose If the user has to answer the business question the top ten customers in terms of purchases for the last twelve months then in such a scenario instead of calculating those figures from the fact it would be easier to query if they are stored in the dimension table.Hope it gives a valid reason for having it in the dim table
November 1, 2012 at 1:51 pm
prakashsa2709 (11/1/2012)
Hi Paul,I think that storing the value " amount of purchases last 12 Month" in the dimension table is valid.Let me explain one scenario where it would be usefull.Suppose If the user has to answer the business question the top ten customers in terms of purchases for the last twelve months then in such a scenario instead of calculating those figures from the fact it would be easier to query if they are stored in the dimension table.Hope it gives a valid reason for having it in the dim table
So you would recalculate your entire customer dimension each month?
What if you want history on that? Duplcate your entire customer dimension each month?
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
November 1, 2012 at 1:52 pm
I think that storing the value " amount of purchases last 12 Month" in the dimension table is valid.
I agree. If the information is there, it must be valuable to your business users. You should make that information available in the dimension.
November 1, 2012 at 4:30 pm
One reason I think this is not a good practice is that the cardinality can be quite high. This can be valid to do if business regards this as important. A better solution is to build categories into the dimension, So those customers that purchased between 0 and n are type 1, etc...etc...
As long as you dont forsee these categories changing. You can even base an attribute on if a customer is one of your top ten purchasers. I dont see an issue with reprocessing the dimension often as needed.
----------------------------------------------------
November 2, 2012 at 5:53 am
Given the use case ("amount of purchases last 12 Month"), I'd consider unlikely a user query like "amount of purchases last 12 Month" = 37, so the suggestion of categories seems adequate. But something like "amount of purchases last 12 Month" < 6 might be useful, and categories might mess this up 😀
But it's always the matter of understanding the need for this information and how it'll be used.
PMWar
November 2, 2012 at 12:08 pm
"But something like "amount of purchases last 12 Month" < 6 might be useful, and categories might mess this up "
A good point but it still all fits. If the amount of people that bought 3,4,5 items is of reporting importance, then call just these categories cat3, cat4, cat5, etc.. As the number of purchases increase you can increase the span of each category in a mathematical fashion.
----------------------------------------------------
November 2, 2012 at 4:32 pm
Paul,
In a case like this I would look towards chaining fact tables. I don't know the proper term, I'm not a warehouse expert, I just build 'em.
What it would look like is something like this:
(Existing Fact Tables) -> CustomerData with only attributes (Dimension) -> FactTable storing the calculated measures for date of measure taken.
You'd have to be careful about including only the most recent fact for each customer when treating it like a continuation of the dimension, but it will both allow you to monitor changes to the measures being delivered as well as separate out the concern of modifying attributes on a dimension when indeed your measures are mobile.
From there, you can have a discussion about 'better ways to achieve the same result' with your developers.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 21, 2012 at 11:44 am
I think in most ways you've answered your own question. Another name for "facts" is "measures" (this is the name that SSAS uses.)
Personally, I would pull them out and create summary fact tables ("aggregations") that present the same data.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply