March 10, 2014 at 8:22 am
Hi folks,
My questions today are about basic design of a Data Warehouse Solution.
In my organization a Data Warehouse using the Inmon’s approach was implemented. That’s means, we have a big highly normalized DW. I want to build a Data Mart for a particular group of business users and I have several basic doubts.
Suppose that the level of granularity are the sum of the sales price of the articles in every invoice, the sum of the quantities, customer, article, data of purchase and country:
- Is it better to perform sums operation during the ETL Process and stored the data in the Data Mart with this granularity or store the data with a low granularity (i.e. one line per article per invoice) and then create views with this aggregation operations?
- If I have to create 4 measures groups which have in common 3 dimensions, let´s say customer, article and date, and other specific dimension like return reason, sales type or credit note reason. Is it better to create a big denormalized fact table with all of the required dimensions and then create a view as data source for each measure group or maybe is better to manipulate these details direct in SSAS?
- Can anyone of you mention or post a link with some guideline to decide whether is more appropriate to calculate something during the ETL, in the Views that feed the SSAS cubes, in the Data Source Views in SSAS or with MDX?
The question that I have to answer is what is better, to simplify the ETL process, which is currently very complex and hard to maintain, using SSAS features. My colleagues want to replace the current ETL with a restore backup of the relevant table of the DW in the DM. I think a complete redesign is needed and the ETL process should not be replace but could be simplify with a better design of the DM.
I know my question is too general that is not easy to give a specific answer, I just want to get some best practices from your experience.
Any comment would be highly appreciated.
Best Regards,
March 11, 2014 at 8:00 pm
You should be happy that you have your data warehouse. All your data is already sourced, a great place to start with dimensional design.
To your questions:
Is it better to perform…
SSAS shines at doing calculations on the fly, your grain should be lowest possible, one line per article per invoice, your measures likely to be Quantity and Sales Price.
Create 4 ….
Typically related measures reside in the same measure group, and typically each measure group has its own fact table. You could denormalize the fact table, however your ETL gets complex and you potentially lose history.
ETL:
Personally, I prefer simple ETL, as you mention, complex ETL is hard to maintain. Generally, best practice today is to have one package per subject area.
In your scenario:
one package per dimension
one package per measure group
I assume the “using SSAS features” is cube processing, they can go in one package
As a nod to Koen, some book references: http://www.bidwbooks.com/learn-dimensional-modeling-basics-to-advanced/
You should watch this http://sqlbits.com/Sessions/Event11/Data_Modeling_for_Analysis_Services_Cubes
March 20, 2014 at 12:44 pm
Creating calculations in your ETL or a named calculated column in your data source view will result in your results being physically materialized in a cube....enabling for quicker rollups/aggregations.
Creating calculated members and named sets will result in calculations being done on the fly and could potentially be slower.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply