Build Cube off of summary tables?

  • Hi,

    I work at a law firm and we have a Data Fusion - IntelliStat warehouse set up.

    It stores data at the atomic level, as well as in many rollup or summary tables.

    To give you some background, we have many clients. Each client has many matters. Each matter has many timecards.

    There are fact tables for each level. The atomic level is the timecard level. The next level summarizes all timecard info for each matter. The next level summarizes all timecard information for each client.

    My question is, should I build the cube using just the atomic level fact table and let the cube itself handle the aggregations, or should i include the pre-aggregated summary tables as well?

    Thank you very much for your help.

  • My general opinion is that fact tables are best at the most granular level. This gives you the most flexibility with regard to your analysis and reporting.

    The best argument for summary facts is when you start to suffer performance problems due to the volume of data at a finer grain. At that point you have to make sure that your date/time dimension conforms to the summary level, so you may need to do some work there as well.

  • That makes sense.

    Just to be clear, all fact tables in a cube have to be the same granularity. There can't be two fact tables of different granularity in the same cube. For example, I can't have a daily values (atomic) fact table and a yearly values (summary) fact table in the same cube, correct?

    Thanks again.

  • Not correct, you can definitely have fact tables with differing grains. I think the general argument being put forward was to use, where possible, the most granular data you have for a given subject area. Let the engine handle the aggregations. The only issue i can see you hitting is some form of aggregation that the system does now that you can't replicate in SSAS (not sure what that would be though 🙂 ).

    Steve.

  • Hi Steve,

    I appreciate your help with this.

    If I understand correctly, summary tables are created in a warehouse to help with query performance, since the aggregations are pre-calculated.

    Also, when a cube is processed, the aggregations are again pre-calculated and stored in the cube.

    So, this means that pulling pre-aggregated data from a cube is the same as pulling pre-aggregated data from a summary table, therefore there is no real benefit to having summary tables listed as fact tables in a cube. Correct?

    I'm basically trying to see if adding summary tables to my cube will help with performance for the end user, because right now when they create a pivot table using the cube as a data source, it's pretty slow.

    Thanks

  • Generally, yes, having aggregation tables in a EDW can be used to assist query performance. This assistance could be through explicit usage in queries, or via technology (for e.g. Oracle's query optimizer) where the aggregates can automatically be used without users explicitly including them in the query. Again, 'in general' I'd disagree that the aggregate table in a RDBMS is the same as an aggregate within an OLAP database. "Generally" you should find the MOLAP storage of base data and indexes significantly faster, and more covering within the MOLAP side than similar tables within the DB.

    There's no real advantage to trying to include aggregate tables in measure groups within your cube. If set up correctly, your cube aggregates should give you the performance you're looking for.

    As far as the cube access being slow, there are quite a lot of factors. In the past, I've found that a lot depends on what the user is trying to bring back onscreen. For example, if you have no hierarchy in a customer dimension, and it contains 5000 customers. Brining back all customers and nesting products purchased within that list, when the user is actually looking for a single Customer, is a huge waste of resource and potentially poorly performing. Getting the user to put the Customer in as a filter and then putting products on the rows will nearly always return a result in factors of speed differently.

    Aside from user education on how to generate queries, you on the technical side can do a variety of things, including tracing the SSAS instance and looking to see how many (if any!) aggregations are being used to answer queries. Perhaps storing several days worth of querying to then let the aggregation wizard utilize this information to recommend aggregations. Find the performance guide for SSAS (the 2005 one will suffice) and make sure your cube and DB structure is following the recommendations as closely as possible.

    HTH,

    Steve.

  • Thanks Steve, that definitely helps.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply