Incremental updates to dimesions and lazy aggregation

  • Hi Guys

    I'll explain the problem first: we have a cube containing historic data. The cube is partitioned by month and contains data for 2 full years plus the current year, so between 25 and 36 months worth of data is held. A months worth of data is around 5 million fact rows.

    A couple of the dimensions are changing dimensions and a monthly process involves incrementally updating the dimensions and fully processing the newly added month partition.

    As the dimensions are being incrementally updated, this fires off the lazy aggregation of all the cube partitions in the background. It takes around the same time, 15 minutes, for a partition to be lazily processed as it would to fully process that partition.

    The thing is, is that as the cube holds historic data, the data does not change, so there is no need for any sort of processing to be carried out on past partitions.

    As far as I am aware, there is no way to prevent lazy processing being kicked off by Analysis Services, so I am looking for options that would prevent unnecessary processing being carried out.

    Right... that's the background!

    What follows is based on how I think things work....If I'm wrong please say so!

    The obvious (well I think so) solution would be to have 3 cubes, one for each years worth of data, and put a virtual cube on top of that to present the data to the users as a whole.

    This is not a problem using shared dimensions for the 3 cubes. As the structure of the cubes is identical, each of the measures would need to be created as a calculated member, in order to sum the values from each cube, and each cubes measures set to hidden.

    The problem with this is that as shared dimensions are being used, again, an incremental update would trigger of lazy aggregation of the partitions of each cube, so back to square 1.

    In order to prevent lazy aggregation, each of the cubes needs to be self contained, so the dimensions should not be shared. This involves a bit more work as the dimensions will need to be created 3 times, but at least the cubes will be separate.

    The problem now is how to combine these cubes to present to the users. A virtual cube can be used, as before, but now as each cube has it's own set of dimensions, 3 sets of dimensions are shown in the virtual cube!

    So (finally I hear you say!) the question is, how can I join these 3 cubes in order to present as a single cube to the users? Is there a way to get round the issue of multiple dimensions when using a virtual cube? Or is there a different way to solve this overall problem of not lazy processing everything?

    I would appreciate any comments and advice,

    Thanks

    Mark

  • Mark,

    Did you fix this issue?? I have a similar problem here. I have 3 changing dimensions on my cube and the cube has around 200 partitions sliced at the daily level.

    When I tested in my test environment, 1 month of data (with data having around 5 million records per day) and rest of the partitions empty; all designed with aggregations took around an hour to get re-calculated after incrementally processing the changing dimension.

    Totally there were 6 months of daily partitions.

    Could you please let me know if you found a fix for this?

  • GGKK - you have most likely over-partitioned your cube. You should look at the project REAL documentation for some tips about partitioning. With only 5 million rows per month, you should probably have quarterly partitions, or possibly monthly - daily is probably hurting your performance.

    As far as the actual question, if you set the attribute relationships in your dimension as Rigid, the lazy aggregator will know it does not have to update any cube information or re-create any indexes when you incrementally process the dimension. This will be the case because the existing aggregations cannot be modified by an incremental update - you can only add new aggregations.

    This does mean that your attribute relationships cannot change. If you cannot set your attribute relationships to Rigid, you actually do need the lazy aggregator to do it's work.

  • GGKK

    Sorry for delay in replying, but I was away on holiday.

    No .... I have not resolved this issue yet. I have currently restricted the data held in the cubes to a rolling 25 months, so at least they will process in a similar timescale each month.

    I didn't mention on the origional post that I am using AS2000. I understand from replies in other places that AS2005 may provide more options for controlling aggs .... I am investigating this at the moment.

    Good luck

  • Michael,

    We decided to drop the aggregations designed for now as it was taking much resource. By the way, I don't find an option to set the attribute relationship as "rigid". Sorry I am a novice in Cubes. Could you guide me??

    Thanks for your response.

    GGKK..

  • Attribute relationships are set up in the dimension editor in BIDS. If you click on the node of an attribute relationship in the designer, the relationship type will be in the properties.

    If you do not know where this is, you should get a book about Analysis Services 2005. This is the most important part of setting up a cube and will have a huge impact not only the performance of the processing, but also the performance of the querying.

  • Okay..Actually I am working in MSAS 2000. Is that something specific to 2005 version?

  • Huge difference. The architecture is completely different. Your problem is likely in your hierarchies. It is most likely a structural problem in your cube design.

  • if you select the dimension on the top you will find the "attribute relationship" tab there you right click and attribute and say new relationship, there you will find the type of relationship you want to fix.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply