May 1, 2009 at 10:31 am
Ok folks, I designed a fairly simple cube with 2 measures and 4 dimensions to analyze data from 4 different accounting systems. Basically I take a list of accounts, balances, and the date the balance was accurate, and trend which investment accounts are growing and which are shrinking. In order to look at accounts from all the systems I used a DW design where I imported all the accounts, then used a view to format all the accounts in the same way. This worked very well for me until I presented it to my team.
My team seems to think that I should not create a DW and should pull the information directly from the original 4 systems. I am looking for resources on what is best practice for this as I'm rather new to it. But it seems to me that the downsides are that without the DW if I loose my cube or it corrupts then I loose the data forever. WIth my dw I can store years worth of data and do a full refresh if needed.
Also, I'm doing a LOT of formatting in the DW before the cube imports it. For example, in one system employees are listed by name, and in another by a code, and in another by initials. In my view I use joins to an employee table so that the employee is always shown as a name to the cube. If I went directly to the original source I would be having to find a way to do that calculated columns in the cube before I could relate the different accounts. Is there any official opinion on whether its better to do that kind of substituting before the cube see's it? I'll give an example below to illustrate.
System A has 3 columns
account# Account Balance Employee
123 $4 John Doe
System B has 3 columns
acount# Account Balance Employee
345 $34 W321
My view that the cube see's is called wm.accountsall and formats the accounts to look like this
account# Account Balance Employee
123 $4 John Doe
345 $34 Jane Smith
So, thoughts? Would it be better to keep doing this formatting/substitution in the DW or do it in the cube?
May 1, 2009 at 1:33 pm
I'm not sure what 'best practices' are out there for this, but I agree with your line of thought. I personally would handle this by modeling out the data with a star-schema design. You'll need one fact table and then whatever dimensions you'll want to use (create yourself a static dates dimension as well).
I would use SSIS to ETL the data from each source into your star-schema and then build your DSV for your cube off of the star-schema. Pretty straight forward.
As you said, you'll have the advantage of being able to keep a balance history in your datamart and that historical data will prove important as you get into looking at trends and forcasting.
May 1, 2009 at 2:01 pm
John Rowan (5/1/2009)
I'm not sure what 'best practices' are out there for this, but I agree with your line of thought. I personally would handle this by modeling out the data with a star-schema design. You'll need one fact table and then whatever dimensions you'll want to use (create yourself a static dates dimension as well).I would use SSIS to ETL the data from each source into your star-schema and then build your DSV for your cube off of the star-schema. Pretty straight forward.
As you said, you'll have the advantage of being able to keep a balance history in your datamart and that historical data will prove important as you get into looking at trends and forcasting.
Hey John, this is exactly what I have done. I actually have 1 fact table and 2 dimension tables. Two of my hierarchies come from the fact table as they are artificial hierarchies in the form of account type, employee, account. And then Employee, Account type, account.
My challenge now is figuring out how to make this work by connecting to csv's that only have 1 month worth of data at any time, and then updating my cube with new data without loosing the old data from a new csv with a new months worth of data. The more I look at it the more it seems like a very very complicated way to achieve my goal when importing into a DW is just so easy and does the job wonderfully.
Thanks for your advice as I work through this!
May 2, 2009 at 10:57 am
An advantage to using a DW is you benefit from having a known structure designed specifically to support analysis. If you use the source systems structure and there is a desire to change the schema, you only need to change your extract piece of your ETL process. If you are tied to the system, they break anything else tied to it potentially. In the long run, it's easier to maintain a system abstracted from your source in my experience.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply