June 19, 2012 at 8:37 am
Hi, I'm looking for some general advice regarding my warehouse structure. I've gone through the 2012 Adventure Works tutorial on Microsoft. But I'm still green on the warehouse structure. I'm half tempted to just jump right in and find all the little mistakes as I go along. But I'm worried that I'll get too deep and never find the time to correct them. So let me give some insight to what I'm trying to put together and where I'm getting stuck.
The business is a car sales company. Everything starts with an application, and that application goes through multiple phases. At each phase, new data is gathered on the customer and stored. The application can stop (if no sale is made) at any phase and no other information is gathered or stored. If a sale is made, the application becomes a loan, and that loan is then tracked to either maturity or default. The ending phase of the application is important because we want try to analyze which customers never get sold to, default, or make it all the way to maturity.
Customers can repeat and have multiple applications/leases over time, but an application is unique, the information gathered is frozen and never updated. Even information regarding the customer because we want to know the age, credit score, marital status, etc at the time of sale. I'll give a simple example below.
Phase 1: Initial Entry
- We capture customer name
Phase 2: Budget
- We capture customer income and expenses.
Phase 3: Pricing
- We capture vehicle cost.
Phase 4: Score
- We capture credit score.
Phase 5: Booking
- we capture loan ID, and track until maturity or default.
So I see a number of strategies here. But I'm just to green too know the pros and cons of each. Should I have 1 fact table? 5? One Dim Table? 2? None?
Thanks in advance.
June 19, 2012 at 8:51 am
Facts are usually numeric, additive and have a value. So on first blush, I'd say you have a FactApplication that has a count and ApplicationAmount and probably a FactLoan measure as well. And dimensions for Date, Customer and Vehicle.
Take a look at: http://it.toolbox.com/blogs/enterprise-solutions/identifying-fact-tables-data-warehouse-20810
HTH,
Rob
June 19, 2012 at 9:12 am
Thanks, thats a good article. It looks like what I'm struggling with is common. The pieces that keep throwing up road blocks for me are things like Customer Income. On its face this is an attribute of the customer, something I'll probably want SSAS to do groupings on. But I think that in reality this is an attribute of the application, a fact. -- Better example. Number of dependents. This is definitely something I can measure and group on. Do i put it on the fact table or the Dimension table?
The other piece, is the missing data. If I have just one fact table for all applications, I'm going to have missing chunks for those apps that never hit all 5 phases. I'm pretty sure that SSAS handles that correctly, but is that good structuring? thats why i was kind of considering having a fact table for each phase.
July 19, 2012 at 11:01 pm
Anything you want to aggregate on should be in the Fact table and anything you want to perform groupings on should be in your dimension table.
The design of your data warehouse will heavily depend on what exactly you want to measure.
If you simply want to know the number of applications in different statuses:
FactApplication :Create a date field for each status.
DimCustomer (Type2) This allows a single customer to have multiple records representing changes to his/her information like income and dependents.
DimVehicle (Type2 or Type1) If you need to track the changes happening to a vehicle, you should use a Type 2.
DimDate (Type1)
DimStatus (Type1)
The status dimension can updated to maintain the most current status of an application and RECORDCOUNT on the cube will give the count of applications in each dimension. This strategy requires constantly updating the Fact as the application moves along the workflow.
You could also do a INSERT ONLY Fact with a snowflake dimension:
DimApplication (Type1) This will contain all application details like Income and Number of dependents and a Key to link it to the DimCustomer and DimVehicle.
DimCustomer (Type1) This table would contain all the base customer information that is static, like SSN, Name, DOB
DimVehicle (Type2 or Type1) If you need to track the changes happening to a vehicle, you should use a Type 2.
DimDate
DimStatus
FactApplication: ApplicationKey, DateKey, StatusKey. ****
In this design, you insert a new record into the fact whenever the status of an application changes. You would do your updates to application details on DimApplication. This design allows you to added more statuses without having to add more fields for each status date, but if your workflow is hard set, the first design should work.
You can also do bucketing on a numerical Dimension field for things like the Income Dimension.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply