General cube and rdbms architecture

  • I'm relatively new to Analysis Services, and have been asked to implement a cube for a client. I've created a Cube which seems right, but am having a problem figuring out how to accomplish the following, bearing in mind that the data in the cube must represent that held in the RDBMS and be no older than 24 hours:

    There's a deleted indicator field in the RDBMS table which comprises the fact table. These fields may be set by the client application for anything up to a month after the record was initially created. We may also be asked at a later date to go into the RDBMS and manually delete a particular record (this could be at any time in the future). The records are sales, which potentially have months until the sale is completed, and the sale may be cancelled at any time up until delivery is taken of the item.

    Example:

    5th June 2003, 3:00pm - Sale created and record inserted into RDBMS

    6th June 2003, 1:00am - Sale transferred to cube using DTS

    4th July 2003, 5:00pm - Sale deleted by client app

    5th July 2003, 1:00am - Want to remove sale from the cube

    If I imported the sale to the cube for analysis purposes on June 6th, how do I get rid of it out of the cube on the 5th July? I don't want to use ROLAP for the entire cube if possible. What are my options? Is this a bad/incorrect architecture?

    Any advice greatly appreciated

    Mark

  • Mark,

    How about reprocessing the cube?

    Do you not want to do this for some reason?

    Jennifer

  • Hi Jennifer,

    I was beginning to think noone would answer, lol. Thanks 🙂

    I don't want to reprocess the cube because I only have 3 months of data in for our first client, and I've yet to refine/fully develop the cube... it takes 15-20 minutes to process already. I may have many years f data, and several clients, which could take hours.

    Analysis services and the RDBMS are on the same server - i.e the sql server attached to my company's sites, so it will have an adverse affect on all of those sites while it's processing, and budget doesn't allow for another server at this point.

    I realise I could partition the data, but I'd be looking at dynamically creating a partition per week at most, in order to get an acceptible performance for a reprocess.

    I'm new to this stuff, so I'm probably missing something blindingly obvious.

    I'd considered using a DTS task and adding negative values to the cube when a record is deleted from teh data source - unfortunately, I need to calculate averages etc...

    Any other ideas?

    Thanks

    Mark

  • You could set the source table filter in your cube to exclude those records where the flag field = 1 (or whatever your flag is). Then you should be able to get away with refreshing the cube data which is a much less resource intensive operation and - if my memory is correct - leaves the cube available for end users to access while it is being refreshed. I could be missing something but it looks like this would work.

    hth,

    Michael

    Michael Weiss


    Michael Weiss

  • PS - check in BOL for refreshing a cube vrs processing...

    Michael

    Michael Weiss


    Michael Weiss

  • Thanks Michael,

    I did a search in SQL BOL and AS BOL for "vrs" and came up with nothing. Could you possibly post a link please?

    I hadn't seen the source table filter before, but it looks like it might work. I think I've been thrown in at the deep end really - the sales info can all be updated after it's imported into the cube - it's not just the deleted field. Every time I think one idea will work, I realise why it won't shortly afterwards.

    Perhaps a good book is in order. Any suggestions? lol

    Cheers

    Mark

  • Hey, Mark...it's all just zeroes and ones...how hard can it be? 🙂

    Seriously, to answer your questions 1)search under 'cube processing' in BOL and 2) get a copy of Microsoft(r) SQL Server(tm) 2000 Analysis Services Step by Step by OLAP Train. And, yes, you did get thrown into the deep end...Analysis Services and OLAP in general are very complicated technologies. I have only worked with them for a little over a year myself and feel like I have only scratched the surface. Keep us posted on your progress and any issues you come across...there are some real experts on this list so it is a good place to come for help. Good luck!

    hth,

    Michael

    PS - There are lots of good articles on AS on the MSDN site at microsoft.com...

    Michael Weiss


    Michael Weiss

  • LOL! For a minute there you sounded like my boss 😛

    Thanks for the book/search advice. Much appreciated.

    Mark

  • I'm curious to know the latency of your cube data. Does it need to be updated immediately.

    I have a similar situation with one of my cubes. Every night I download all the completed work orders, but sometimes those work orders are reopened and changed. My solution is to pick a date for a month accounting has closed, and reimport from that date every night. The table from which the cube draws information has the corresponding information deleted every night before the new upload is added.

    Right now, for example, every night I delete (via scheduled DTS download) from the fact table all work orders completed on or after Dec 2, 2002. I then upload all the work orders completed on Dec 2 or later. Obviously this means that many work orders are loaded and deleted many times, but it's what works best for us. Once accounting tells me that Q1 is closed, I will move the date to the March start date.

    This means the cube has a 1 day latency, and relatively recent information is subject to change, but that is acceptable.

  • Yes, it turns out the cube does need to be updated immediately. On friday I had a meeting with the client, and it turns out the following is what they want (based on what I showed them).

    I have all the dimensions / measures required already (except for 2 more they want added to the relational data source too)

    They want the cube data to be correct up to the minute, if no better can be arranged (rotfl)

    There will only be one level of access - administrator (so no need to worry about security really)

    They want to connect via Excel / Pivot tables, using a spreadsheet, not web page.

    There is no need for the cube data to be completely correct - they're quite happy for me to ignore any updated / deleted data, as it will be so infrequent, and there are other reports which run off the relational source which will provide true values if necessary.

    So, what I'm planning is to add a field to the relational data source, indicating whether the record has been transferred to the cube using DTS.

    Run a monthly DTS after the sales have been locked, checking for sales flagged as delivered, but without any data in the "isTransferred" field - and transfer all of those fields.

    Run the current month's cube data using ROLAP.

    I'll keep an eye on performance / common queries to see how it's being used, and make sure this model works as hoped. Hopefully the current month's data won't be required too often (and if it is, will be run from the website reports).

    Any comments / suggestions on this plan are most welcome...

    Thanks

    Mark

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply