Dimensional DB Design Question

Question

Dimensional DB Design Question

Eric Mamet

SSChampion

Points: 11728
More actions
January 28, 2008 at 4:44 am

#186064

Dear all,
I am familiar with Dimensional Modelling but here is a type of question I have not had to address before and I don't quite know how to model/handle this.
We supply financial products to customers. A customer applies through an "Application".
Depending on various factors, an Application may require an action from an Operator in order to progress. We call this a "Task".
Once a "Task" is fullfilled, another "Task" maybe necessary and there may be a delay before that next task starts (the end of one task is not necessarily the beginning of the next).
Tasks are grouped in Categories.
We would like to be able to answer questions like: "What were the Applications that moved from a Task of category X to a Task of category Y within a particular period?"
My guess would be that I will have a Fact table of Tasks with the following columns (foreign key to dimension tables)
- TaskApplication -> DimApplication (which application does this task applies to)
- Task_StartDateTime -> DimDate
- Task_EndDateTime -> DimDate
- TaskType -> DimTask (Type and Category)
My difficulty is then: how do I (quickly) identify all applications that moved from one task of type X to another of type Y.
Am I right in thinking that I should have a column in my fact table indicating the Task Rank for each Application (first task for that application is 1, second 2, etc)?
Eric :unsure:

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Loner SSC-Insane Points: 21279 More actions · Answer 1

What if you use a factless table to keep track of the application? (Kimball used this method to keep track of which event happened, you could find it in his book.)

FromTask - Dim Task

FromDate - Dim Date

ToTask - Dim Task

ToDate - Dim Date

Application - Dim Application

Eric Mamet SSChampion Points: 11728 More actions · Answer 2

You may be right

I'll have to go back to his books (which I have read some time ago)

Thanks

Dave Balsillie SSCertifiable Points: 5589 More actions · Answer 3

While this approach will work, it looks like you're creating a highly sparse cube structure. (This is often a a marker, for using BI technologies for othger purposes.)

From what I can see, you would be as far ahead simply designing a report from your operational data store, rather than trying to model this into a dimensional form. Some general quidelines that I've used on dimensional modelling:

- The dimension should be fully specified, without regard to other dimensions (or fact tables for that matter).

- Dimension values should be reusable via the intersection of the fact tables to other dimensions. For example, it should be a valid construct, where both customer A and customer B point to the same application. (I suspect that's not true in this case.)

- Ideally, fact values have some aggregation associated with them. BI is used to quickly identify trends, do rollups, etc.

Based on the comments from your original post, I'd expect the following:

Customer Dimension - track all chars of the customer

Time dimension - This is likely the "date sold", or when it was initiated.

Task Type dimension - type, category, etc.

Product dimension - details on the various financial products

The "data dimension" represents the application, and it's various stages:

Measures:

Time executing task

Application ID (rollup a distinct count of the application IDs)

Start date (rollup the minimum date)

End Date (rollup the maximum date)

Operator (rollup a distinct count of the operators)

Note: if the same task can occur multiple times for the same application, then you'll need to add a second time dimension.

I hope this helps.

Eric Mamet SSChampion Points: 11728 More actions · Answer 4

Kind of... Makes me think!

I suggested to use a datamart instead of fullfilling the client's report request one by one and we are not planning to use Analysis Services (cubes) for the time being.

In essence, I am just trying to create an easy base for reporting.

Our view of the data is also partial as we are only an "intermediary" (the web front-end to mainframe back ends). Our client uses their own datawarehouse inhouse.

I'll do a bit of reading

Thanks.

Alan G-436699 Mr or Mrs. 500 Points: 534 More actions · Answer 5

Dear all,

I am familiar with Dimensional Modelling but here is a type of question I have not had to address before and I don't quite know how to model/handle this.

We supply financial products to customers. A customer applies through an "Application".

Depending on various factors, an Application may require an action from an Operator in order to progress. We call this a "Task".

Once a "Task" is fullfilled, another "Task" maybe necessary and there may be a delay before that next task starts (the end of one task is not necessarily the beginning of the next).

Tasks are grouped in Categories.

We would like to be able to answer questions like: "What were the Applications that moved from a Task of category X to a Task of category Y within a particular period?"

My guess would be that I will have a Fact table of Tasks with the following columns (foreign key to dimension tables)

- TaskApplication -> DimApplication (which application does this task applies to)

- Task_StartDateTime -> DimDate

- Task_EndDateTime -> DimDate

- TaskType -> DimTask (Type and Category)

My difficulty is then: how do I (quickly) identify all applications that moved from one task of type X to another of type Y.

Am I right in thinking that I should have a column in my fact table indicating the Task Rank for each Application (first task for that application is 1, second 2, etc)?

Eric

Hi Eric,

Your Fact Table looks just right to me. You simply need a query to answer your question. I see no reason why you can't have a query that interrogates the Fact table twice - something like:

SELECT DISTINCT FACT_FROM.TaskApplication

FROM FACT_TABLE FACT_FROM,

FACT_TABLE FACT_TO,

DIMDATE DATE_FROM_START,

DIMDATE DATE_TO_START,

DIMDATE DATE_TO_END,

DIMTASK TASK_FROM,

DIMTASK TASK_TO

WHERE FACT_FROM.STARTDATETIME = DATE_FROM_START.DATETIME

AND FACT_TO.STARTDATETIME = DATE_TO_START.DATETIME

AND FACT_TO.ENDDATETIME = DATE_TO_END.DATETIME

AND FACT_FROM.TaskApplication = FACT_TO.TaskApplication

AND DATE_TO_START.DATETIME >= @DATE_FROM

AND DATE_TO_END.DATETIME <= @DATE_TO

AND DATE_FROM_START.DATETIME <= DATE_TO_START.DATETIME -- Make sure X occurs before Y

AND FACT_FROM.TASK_TYPE_KEY = TASK_FROM.TASK_KEY

AND FACT_TO.TASK_TYPE_KEY = TASK_TO.TASK_KEY

AND TASK_FROM.TASK_TYPE = 'X'

AND TASK_TO.TASK_TYPE = 'Y'

I know the query above is not perfect and I've not used ANSI joins etc but hopefully it shows how you can get the answer to your question using your chosen fact table structure.

Eric Mamet SSChampion Points: 11728 More actions · Answer 6

I must spend a bit of time looking into the factless fact idea. I am in no rush.

However, if I went for the initial solution, I suspect I still need to work out the ranking between tasks because the precise moment an application "move" from one task of type X to a task of type Y is when the first task of type Y following a type of type X starts...

I would then have a query like

[font="Courier New"]SELECT DISTINCT FactFrom.ApplicationKey

FROM TaskFact TaskFrom

INNER JOIN TaskFact TaskTo ON TaskFrom.ApplicationKey = Facto.ApplicationKey

AND TaskFrom.TaskRank = TaskTo.TaskRank - 1

AND TaskFrom.TaskType = X

AND TaskTo.TaskType = Y

WHERE TaskTo.Start_StartDateTime >= @MinDate

AND TaskTo.Start_StartDateTime <= @MaxDate

[/font]

I suspect the factless fact idea is better because it's all worked out upfront in the ETL and the final query gets straight forward, which is pretty much my goal as we won't use cubes.