November 22, 2011 at 4:52 am
Hi everyone,
I would like to get your input on the best practice/method for calculating a dimension attribute that is derived from a fact record. Something along the lines of "last sale amount" attribute in the customer dimension.
Options that comes to mind...
Option1:
populate client dim, populate sales fact and then as a 3rd seperate process, calculate the attr value by querying the fact table by customer key
Option2:
During the fact etl process, calculate the value per customer id and update the client dim.
Options3:
Leave it for the reporting tool
Any other options? Thanks!
November 22, 2011 at 7:00 am
T_VR (11/22/2011)
Hi everyone,I would like to get your input on the best practice/method for calculating a dimension attribute that is derived from a fact record. Something along the lines of "last sale amount" attribute in the customer dimension.
Options that comes to mind...
Option1:
populate client dim, populate sales fact and then as a 3rd seperate process, calculate the attr value by querying the fact table by customer key
Option2:
During the fact etl process, calculate the value per customer id and update the client dim.
Options3:
Leave it for the reporting tool
Any other options? Thanks!
Personally I do not like the idea of including such calculated attribute on a DIM table - DIM table should include the attributes used to dice and slice the FACT data.
If "last amount sale" is a granular piece of data then it should be stored on FACT table, if "last amount sale" is nothing but the summarization of a invoice then such information should be located on invoice describing table like fact_invoice.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.November 22, 2011 at 7:10 am
T_VR (11/22/2011)
Hi everyone,I would like to get your input on the best practice/method for calculating a dimension attribute that is derived from a fact record. Something along the lines of "last sale amount" attribute in the customer dimension.
Options that comes to mind...
Option1:
populate client dim, populate sales fact and then as a 3rd seperate process, calculate the attr value by querying the fact table by customer key
Option2:
During the fact etl process, calculate the value per customer id and update the client dim.
Options3:
Leave it for the reporting tool
Any other options? Thanks!
In general I would prefer Opt#3. I would also not consider it as Dimensional Attribute.
November 22, 2011 at 7:14 am
Thanks for the feedback,
Maybe my exampe wasn't the best one...
If we change the dim attribute to "last purchased date"
1) would you include that in your dimension?
2) if so, which option would you go to do that?
November 22, 2011 at 7:23 am
T_VR (11/22/2011)
Thanks for the feedback,Maybe my exampe wasn't the best one...
If we change the dim attribute to "last purchased date"
1) would you include that in your dimension?
2) if so, which option would you go to do that?
Do you really think it adds any value to Customer Dimension? Can it be a property of Customer?
It's a fact and need to be calculated at runtime.
If you add it to Dimension, you need to load it in each ETL (don't know the frequency). How your SCD will identify this change?
November 22, 2011 at 7:37 am
Thanks Dev,
The main advantage for me was that this is something that is used often within queries and this solution could have made the report not hit the fact table to get the value. So basically shift the processing from the report to the ETL. However, it seems this might not be the best way of going about it 🙂
November 22, 2011 at 7:46 am
T_VR (11/22/2011)
Thanks Dev,The main advantage for me was that this is something that is used often within queries and this solution could have made the report not hit the fact table to get the value. So basically shift the processing from the report to the ETL. However, it seems this might not be the best way of going about it 🙂
You can still do it if your SCD is simple (overwrite). I would still not prefer it.
November 22, 2011 at 8:57 am
T_VR (11/22/2011)
The main advantage for me was that this is something that is used often within queries and this solution could have made the report not hit the fact table to get the value. So basically shift the processing from the report to the ETL. However, it seems this might not be the best way of going about it 🙂
I see... specification suddendly changed from "last purchased amount" to "last purchased date" - no problem 😀
"last purchased date" is certainly a granular attribute describing a purchase. Hard to believe a query against data warehouse would ever not hit a fact table, after all there are the "facts" business people likes so much to see 🙂
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.November 23, 2011 at 12:40 am
PaulB-TheOneAndOnly (11/22/2011)
I see... specification suddendly changed from "last purchased amount" to "last purchased date" - no problem 😀
Hey, if business can do it so can I 😀
Thanks for all the input! much appreciated!
November 24, 2011 at 8:43 am
The main advantage for me was that this is something that is used often within queries and this solution could have made the report not hit the fact table to get the value. So basically shift the processing from the report to the ETL. However, it seems this might not be the best way of going about it 🙂
Personally from what you're saying I think this probably is the best way of going about it, can there are potentially good uses for it in a dimension, although it is unusual, at least in my experience.
November 24, 2011 at 4:25 pm
I guess YMMV, depending on your report vs ETL performance, but in general I would have expected it to be more expensive to do it in the ETL, quite cheap to do it in the report, and also the data is properly factual.
Having said that, I have done it in the ETL before (at someone elses insistance) but more often pushed it back successfully.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply