June 30, 2004 at 8:51 am
I'm still rather new at data warehousing, but have come across a situation that you may be able to help guide me on.
We have a cube in AS that is built upon a view of a SQL Server table (and supporting lookup tables). This main table is called "tblCubeSquaredFact," and is the fact table (duh). tblCubeSquaredFact is regenerated every day, and has all this history of our data (2+ years). Each day, the previous days' data in included, and changes made to historical data is updated.
Therein lies the rub: because any day has data that is not just new, but changed, we always (seem) to need to do a full reprocess, which takes about 1.5 hours. However, if we do an incremental, which only takes about 10 minutes, we will get the new data, but the old data gets duplicated! So, if on 01/05/2004, we had a measure that showed $1000, if we updated something else for 01/05/2004, the next day our cube would show $2000.
This is a long way to ask the following: I'd like to be able to tell AS to do an incremental update whereby it adds new records, and deletes and reinserts changed records. Anyone have any suggestions?
Thanks!
June 30, 2004 at 9:54 am
There is no way to tell AS how to do this for an incremental update.
What you need to do is find all the records that were changed and delete them from the fact table.
Then populate the fact table with the incremental update and it will add in new records and the records deleted (i.e changed).
There are a couple of other methods you could use involving virtual cubes, but this way is the most tidiest and easiest to maintain.
Additionally, if you know that records are not changed from a certain point in time, you can just bulk delete these.
June 30, 2004 at 10:01 am
John,
Thanks for the reply.
I'm not sure I understand what you mean, though. If I first delete the records that have been updated from the fact table, and then turn around and populating the fact table with the new and updated records, when I do an incremental won't it still duplicate the data for the updated data? If I do a full reprocess, I'm right where I am now. Can you set me a bit more straight on this?
Also, can you expound on the use of virtual cubes?
Thanks!
June 30, 2004 at 10:36 am
Apologies. PLease ignore my comment. I was thinking of something else I have been doing lately.
A better option would be to use the 'REFRESH' method. This will take longer than incremental, but not by too much. Try it out and see if it makes a real difference.
With regards to virtual cubes, I don't think they will help you here. They are basically like views on tables, but instead are on cubes. I.se linking two cubes together.
July 1, 2004 at 5:05 am
Hi,
Do You have "incremental update filters" ? - see in BOL.
When Analysis Services does an incremental update, it does not not
update changed records, it only reads new records into the cube,
using the partition filters.
( each cube partition can have filters, only AS enterprise edition can have multiple partitions for a cube .
As for the updated records problem one technique is the following:
Dont' update the record, but add a new record such that the sum of the two correspond to the new values.
Or ( better imo )
Add 2 records - 1 that negates the record to be updated, and 1 with the new updated values.
/rockmoose
You must unlearn what You have learnt
July 1, 2004 at 6:36 am
A very interesting idea from rockmoose that I never thought of doing. However, I can see over time that the cube will become very hard to investigate for differences.
A good idea may be to refresh once a week and use the incremenation technique daily in between. I am going to try this out myself and see how it goes.
July 1, 2004 at 7:14 am
Some of our transaction systems use this technique. For example when a customers bill is adjusted, a new "credit" record is added which negates the first one, and then a record with the corrected amount is added.
Sometimes this can be a pain when investigating... and guess what, it is usually in cases like these that human and computer errors occur. Especially when the process spans over periods... oh well..
We also implement what Jonathan suggested - daily incremental updates, and a full process weekly or monthly.
Full Process because some of the dimensions change over time.
cheers,
rockmoose
You must unlearn what You have learnt
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply