January 28, 2008 at 4:44 am
Dear all,
I am familiar with Dimensional Modelling but here is a type of question I have not had to address before and I don't quite know how to model/handle this.
We supply financial products to customers. A customer applies through an "Application".
Depending on various factors, an Application may require an action from an Operator in order to progress. We call this a "Task".
Once a "Task" is fullfilled, another "Task" maybe necessary and there may be a delay before that next task starts (the end of one task is not necessarily the beginning of the next).
Tasks are grouped in Categories.
We would like to be able to answer questions like: "What were the Applications that moved from a Task of category X to a Task of category Y within a particular period?"
My guess would be that I will have a Fact table of Tasks with the following columns (foreign key to dimension tables)
- TaskApplication -> DimApplication (which application does this task applies to)
- Task_StartDateTime -> DimDate
- Task_EndDateTime -> DimDate
- TaskType -> DimTask (Type and Category)
My difficulty is then: how do I (quickly) identify all applications that moved from one task of type X to another of type Y.
Am I right in thinking that I should have a column in my fact table indicating the Task Rank for each Application (first task for that application is 1, second 2, etc)?
Eric :unsure:
January 28, 2008 at 8:26 pm
What if you use a factless table to keep track of the application? (Kimball used this method to keep track of which event happened, you could find it in his book.)
FromTask - Dim Task
FromDate - Dim Date
ToTask - Dim Task
ToDate - Dim Date
Application - Dim Application
January 29, 2008 at 3:00 am
You may be right
I'll have to go back to his books (which I have read some time ago)
Thanks
January 29, 2008 at 8:10 am
While this approach will work, it looks like you're creating a highly sparse cube structure. (This is often a a marker, for using BI technologies for othger purposes.)
From what I can see, you would be as far ahead simply designing a report from your operational data store, rather than trying to model this into a dimensional form. Some general quidelines that I've used on dimensional modelling:
- The dimension should be fully specified, without regard to other dimensions (or fact tables for that matter).
- Dimension values should be reusable via the intersection of the fact tables to other dimensions. For example, it should be a valid construct, where both customer A and customer B point to the same application. (I suspect that's not true in this case.)
- Ideally, fact values have some aggregation associated with them. BI is used to quickly identify trends, do rollups, etc.
Based on the comments from your original post, I'd expect the following:
Customer Dimension - track all chars of the customer
Time dimension - This is likely the "date sold", or when it was initiated.
Task Type dimension - type, category, etc.
Product dimension - details on the various financial products
The "data dimension" represents the application, and it's various stages:
Measures:
Time executing task
Application ID (rollup a distinct count of the application IDs)
Start date (rollup the minimum date)
End Date (rollup the maximum date)
Operator (rollup a distinct count of the operators)
Note: if the same task can occur multiple times for the same application, then you'll need to add a second time dimension.
I hope this helps.
January 29, 2008 at 8:58 am
Kind of... Makes me think!
I suggested to use a datamart instead of fullfilling the client's report request one by one and we are not planning to use Analysis Services (cubes) for the time being.
In essence, I am just trying to create an easy base for reporting.
Our view of the data is also partial as we are only an "intermediary" (the web front-end to mainframe back ends). Our client uses their own datawarehouse inhouse.
I'll do a bit of reading
Thanks.
January 30, 2008 at 2:05 pm
Dear all,
I am familiar with Dimensional Modelling but here is a type of question I have not had to address before and I don't quite know how to model/handle this.
We supply financial products to customers. A customer applies through an "Application".
Depending on various factors, an Application may require an action from an Operator in order to progress. We call this a "Task".
Once a "Task" is fullfilled, another "Task" maybe necessary and there may be a delay before that next task starts (the end of one task is not necessarily the beginning of the next).
Tasks are grouped in Categories.
We would like to be able to answer questions like: "What were the Applications that moved from a Task of category X to a Task of category Y within a particular period?"
My guess would be that I will have a Fact table of Tasks with the following columns (foreign key to dimension tables)
- TaskApplication -> DimApplication (which application does this task applies to)
- Task_StartDateTime -> DimDate
- Task_EndDateTime -> DimDate
- TaskType -> DimTask (Type and Category)
My difficulty is then: how do I (quickly) identify all applications that moved from one task of type X to another of type Y.
Am I right in thinking that I should have a column in my fact table indicating the Task Rank for each Application (first task for that application is 1, second 2, etc)?
Eric
Hi Eric,
Your Fact Table looks just right to me. You simply need a query to answer your question. I see no reason why you can't have a query that interrogates the Fact table twice - something like:
SELECT DISTINCT FACT_FROM.TaskApplication
FROM FACT_TABLE FACT_FROM,
FACT_TABLE FACT_TO,
DIMDATE DATE_FROM_START,
DIMDATE DATE_TO_START,
DIMDATE DATE_TO_END,
DIMTASK TASK_FROM,
DIMTASK TASK_TO
WHERE FACT_FROM.STARTDATETIME = DATE_FROM_START.DATETIME
AND FACT_TO.STARTDATETIME = DATE_TO_START.DATETIME
AND FACT_TO.ENDDATETIME = DATE_TO_END.DATETIME
AND FACT_FROM.TaskApplication = FACT_TO.TaskApplication
AND DATE_TO_START.DATETIME >= @DATE_FROM
AND DATE_TO_END.DATETIME <= @DATE_TO
AND DATE_FROM_START.DATETIME <= DATE_TO_START.DATETIME -- Make sure X occurs before Y
AND FACT_FROM.TASK_TYPE_KEY = TASK_FROM.TASK_KEY
AND FACT_TO.TASK_TYPE_KEY = TASK_TO.TASK_KEY
AND TASK_FROM.TASK_TYPE = 'X'
AND TASK_TO.TASK_TYPE = 'Y'
I know the query above is not perfect and I've not used ANSI joins etc but hopefully it shows how you can get the answer to your question using your chosen fact table structure.
January 31, 2008 at 3:42 am
I must spend a bit of time looking into the factless fact idea. I am in no rush.
However, if I went for the initial solution, I suspect I still need to work out the ranking between tasks because the precise moment an application "move" from one task of type X to a task of type Y is when the first task of type Y following a type of type X starts...
I would then have a query like
[font="Courier New"]SELECT DISTINCT FactFrom.ApplicationKey
FROM TaskFact TaskFrom
INNER JOIN TaskFact TaskTo ON TaskFrom.ApplicationKey = Facto.ApplicationKey
AND TaskFrom.TaskRank = TaskTo.TaskRank - 1
AND TaskFrom.TaskType = X
AND TaskTo.TaskType = Y
WHERE TaskTo.Start_StartDateTime >= @MinDate
AND TaskTo.Start_StartDateTime <= @MaxDate
[/font]
I suspect the factless fact idea is better because it's all worked out upfront in the ETL and the final query gets straight forward, which is pretty much my goal as we won't use cubes.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply