April 18, 2016 at 1:53 am
Hi Everyone,
I would like to ask for your opinion regarding this topic.
Background: We have a enterprise data warehouse which feeds our data marts. Usually we follow a bottom-up approach in the data mart design, that means, we are transferring all what we think is useful from DW to the data mart and try to model this in a simple start schema.
Requirements: business requirements are build on top of this tables- Usually we have to deliver KPIs and reporting specific calculations. I.e., suppose we have two columns in a fact table: DeliveryDate, ConfirmedDate. There is a KPI "IF DeliveryDate - ConfirmedDate <= 3 Then OnTime ELSE Late"
Discussion: The reports are running on top of a cube, which takes the date from a view layer. In the past I programmed the requirements above in this view layer. Then if something changes, like the threshold from 3 days to 2 or 4, only a view needs to be adjusted and the cube processed.
Some colleagues argue, that the business users need to find this data in the data mart tables, therefore the want to include these calculations in the ETL process and already create a column in the fact table "Ontime".
Actually it depends on the usage from our systems, because this calculation could also be implemented directly in the cube, but of course the business users should only be allowed to use the cube.
Could you please share your experience with me regarding this topic?
Any comment will be appreciated.
Kind Regards,
April 19, 2016 at 6:31 am
Strictly speaking I think this should reside in the data layer as it is an attribute of each fact.
IMHO, the KPI would be more along the lines of "What proportion are OnTime?" or "Are more than 95% OnTime?" in which case that definitely resides in the reporting layer.
Rather than define a specific transformation and persist the data in the fact table, you could create a computed column on the fact table or include the calculation in the view layer.
Another reason for this being in the data layer is that if there is a different reporting mechanism (e.g. export to file) then the data is available but if the calculation is in the cube then you would have to duplicate the code for the other reporting mechanism.
Jez
April 19, 2016 at 7:06 am
Hi Jez,
Thanks for your answer. I guess there is no single best practice that satisfies everyone.
I will use the following guidelines:
- Try to add the fact based calculations to the fact tables, in order to avoid that users or applications can dispute the values.
- For the “soft-rules” use computed columns instead views or hardcode them in ETL packages. In this way if some definition slightly changes, no SSIS package needs to be adjusted and redeployed. If all of the needed columns for the calculations are included in the fact table, there are at least two options: 1) include these columns in the fact table, 2) use a view.
- Use computed columns with persisted values. I need to test the performance of writing in such a tables, but a data mart should be optimized for reading.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply