Building hierarchy from the denormalized fact dimension

Question

Building hierarchy from the denormalized fact dimension

Hawkeye_DBA

SSCarpal Tunnel

Points: 4365
More actions
January 7, 2011 at 9:10 am

#229915

I'm new so bear with me please.. this may be an easy question 🙂
I have a simple Cube:
1 fact table
10 dimensions
My fact table is a denormalized table from 4 sources. The repeating key is my CallTableID. So, the data looks like this:
CallTableID 1:M ConnectTableID
ConnectTableID 1:M QueueTableID
QueueTableID 1:M ActivityTableID
Where CallTableID is the repeating factor.
Under each ID above, there are other supporting facts in the fact table like CallDate, CallExtension, CalledNumber, CallDuration, ConnectStartTime, ConnectEndTime, ConnectExtension, ConnectDuration, ConnectHoldTime, QueueStartTime, QueueDuration
I'd like to be able to show measures like:
Total Distinct Calls by CallTableID
Total Connects by CallTableID
Total Queue Calls by CallTableID
Average ConnectHoldTime by CallTableID
Etc...
How do I structure my hierarchy for this?
Thank you in advance for your help!
Sandy

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Chris Campbell-415954 SSCrazy Points: 2248 More actions · Answer 1

Is CallTableID the key of dimension you want to roll your measures up to and you want to build a hierarchy to drill down on Connect, Queue and Activity?

Your fact table should contain only dimension keys and measures (numbers you want to report on). Each dimension table should join to the fact table on the dimension keys. If you have a natural relationship between the Call, Connect, Queue and Activity dimensions (assuming they are dimensions), you can build a drill-down hierarchy on them in the dimension editor by creating attribute relationships between the keys.

Your fact table should store data at the granularity of the lowest (most detailed) level of the hierarchy. Analysis Services will automatically handle rolling up the measures to the levels of your hierarchy when you query the cube by the dimension that contains it.

Hawkeye_DBA SSCarpal Tunnel Points: 4365 More actions · Answer 2

Thank you for your reply, sorry I think I may have given the wrong description.

The reason I included the call, connect, and queue ids on the fact table was to reference the source data, but the fact table data contains daily facts about calls, connects, and queues. I didn't want to create 3 fact tables since that would be like replicating the OLTP environment, and I thought BI processing would be better on a "flattened" table... however I am new so maybe you can help me architect a better solution?

here's what I have today:

My Fact table has data like this:

FactTableID

CallTableID

ConnectTableID

QueueTableID

CallDurationSec

ConnectDurationSec

QueueDurationSec

ConnectHoldTime

ConnectRingTime

ConnectTalkTime

ConnectPartyID

QueueExtensionKey

CallExtensionsKey

...

FactTableID CallTableID ConnectTableID CallDuration ConnectDuration CallExtensionKey CallTypeKey

1 1234 11111 2 0 55555 1

2 1234 11112 1 1 55555 2

My Dimension tables are like this:

DimExtensions

ExtKey ExtAltKey ExtName

55555 3221 Susie

DimCallTypes

TypeKey TypeAltKey TypeName

1 1 Inbound

2 2 Outbound

Chris Campbell-415954 SSCrazy Points: 2248 More actions · Answer 3

What I don't see in your example is a dimension table called DimCall. If you want to roll up by CallTableID, you will need a dimension that allows you to aggregate on that key.

Hawkeye_DBA SSCarpal Tunnel Points: 4365 More actions · Answer 4

Thank you for your help

Please bear with me as I want to make sure I'm understanding this right - sorry, this will consistute a HUGE change to my current architecture so I want to be sure I'm really getting you right.

I cannot do a GROUP BY for an aggregation within my cube on my denormalized table?

So, I have to have a table for my calls, which will grow insurmountably because this will be added to every day with my ETL. and then another table for my Connections, and probably another table for my Queue Calls?

I thought that Dimensions were supposed be slowly changing data sets that provided "where clause" for BI? Like Call type, Extension, Call Date, etc.

Would you please help me and post an example of what you might suggest, it would be really helpful 🙂 Here are the table structures of my original data that are the "main" tables in the OLTP source. The other tables are all Dimension tables that provide the details of CallType, BillingCodes, PortNames, etc.

thank you again for your help, this is such a switch from traditional design and I'm running on a tight timeframe to get this done - by myself without training 🙁

CREATE TABLE [dbo].[Call](

[CallTableID] [bigint] NOT NULL,

[SIPCallId] [nvarchar](50) NULL,

[StartTime] [datetime] NOT NULL,

[StartTimeMS] [int] NOT NULL,

[EndTime] [datetime] NULL,

[EndTimeMS] [int] NULL,

[CallNote] [nvarchar](96) NULL,

[BillingCode] [nvarchar](50) NULL,

[FriendlyBillingCode] [nvarchar](75) NULL,

[Extension] [nvarchar](50) NULL,

[Duration] [datetime] NULL,

[CallType] [smallint] NOT NULL,

[WorkgroupCall] [int] NULL,

[LongDistance] [int] NULL,

[DialedNumber] [nvarchar](50) NULL,

[CallerID] [nvarchar](50) NULL,

[Archived] [int] NULL,

[MediaStreamID] [bigint] NULL)

CREATE TABLE [dbo].[Connect](

[ConnectTableID] [bigint] NOT NULL,

[PartyType] [int] NOT NULL,

[CallTableID] [bigint] NOT NULL,

[LineID] [bigint] NOT NULL,

[SwitchID] [bigint] NOT NULL,

[PortNumber] [int] NOT NULL,

[PortID] [bigint] NOT NULL,

[PortName] [nvarchar](255) NULL,

[GroupID] [bigint] NOT NULL,

[GroupName] [nvarchar](255) NULL,

[ConnectTime] [datetime] NULL,

[ConnectTimeMS] [int] NULL,

[DisconnectTime] [datetime] NULL,

[DisconnectTimeMS] [int] NULL,

[ConnectReason] [int] NOT NULL,

[DisconnectReason] [int] NOT NULL,

[PartyIDFlags] [int] NOT NULL,

[PartyID] [nvarchar](255) NULL,

[PartyIDName] [nvarchar](255) NULL,

[PartyIDLastName] [nvarchar](255) NULL,

[CtrlPartyIDFlags] [int] NOT NULL,

[CtrlPartyID] [nvarchar](255) NULL,

[CtrlPartyIDName] [nvarchar](255) NULL,

[CtrlPartyIDLastName] [nvarchar](255) NULL,

[MailboxID] [nvarchar](255) NULL,

[RelatedCallTableID] [bigint] NULL,

[TalkTime] [datetime] NULL,

[TalkTimeSeconds] [bigint] NULL,

[HoldTime] [datetime] NULL,

[RingTime] [datetime] NULL,

[Duration] [datetime] NULL,

[LongDistance] [int] NULL,

[TrunkDirection] [int] NULL

)

CREATE TABLE [dbo].[Queue](

[QueueTableID] [bigint] NOT NULL,

[CallID] [bigint] NULL,

[ConnectTableID] [bigint] NULL,

[StartTime] [datetime] NULL,

[Duration] [datetime] NULL,

[DurationSeconds] [bigint] NULL,

[QueueName] [nvarchar](75) NULL,

[QueueDN] [nvarchar](50) NULL,

[ExitReason] [smallint] NULL,

[TargetType] [smallint] NULL,

[TargetFirstName] [nvarchar](100) NULL,

[TargetLastName] [nvarchar](100) NULL,

[TargetDN] [nvarchar](50) NULL,

[Archived] [tinyint] NULL

)

CREATE TABLE [dbo].[AgentActivity](

[AgentActivityTableID] [int] NOT NULL,

[QueueCallID] [int] NULL,

[AgentDN] [nvarchar](50) NOT NULL,

[AgentFirstName] [nvarchar](100) NULL,

[AgentLastName] [nvarchar](100) NULL,

[State] [int] NULL,

[WorkgroupDN] [int] NULL,

[WorkgroupName] [nvarchar](100) NULL,

[StartTimeStamp] [datetime] NULL,

[EndTimeStamp] [datetime] NULL

)

Hawkeye_DBA SSCarpal Tunnel Points: 4365 More actions · Answer 5

I did some more drawing..

I think I understand what you're saying. My facts, are thinks like CallDuration, QueueTime, HoldTime, RingTime, etc.

My Dimensions are the Call Details, and perhaps it will need to be a snowflake schema, with extra dimensions for the Call Details Dimension, like DimExtensions, etc.

Is this on the right track now?

Thanks again for your help!

Chris Campbell-415954 SSCrazy Points: 2248 More actions · Answer 6

You're definitely on the right track now.

Facts are generally transactional records from your source systems. They contain the measures (numbers) that you want to summarize combined with the keys for to the related dimension tables.

You are correct that dimension tables USUALLY contain a relatively small number of rows and are USUALLY slowly changing. However, in some cases, we need dimensions with large numbers of rows to produce the type of reporting you are trying to get to (in this case, by Call ID). A diminsion that is identified only by a key value and with no other attributres is referred to as a "degenerate" or "fact" dimension and can be created in BIDS using the "fact" relationship type on the Dimension Usage tab of the cube editor.

The classic example of a degenerate dimension is in a sales system where users want to be able to summarize by Invoice Number. By itself, Invoice Number doesn't really have any attributes that go along with it but users want to be able to see a report by Invoice will all the line items rolled together. I think that this is similar to your case where you want to see all of the details for the Call ID rolled together.

If you haven't already, pick up Ralph Kimball's book the "Data Warehouse Toolkit". It covers many business scenarios and how to model them for BI.

If you would be interested in mentoring, training or other BI consulting services (shameless plug), please let me know and we can talk off-line.

Hawkeye_DBA SSCarpal Tunnel Points: 4365 More actions · Answer 7

Thank you so much for your quick replies, you've helped me tremendously!

I am definately intereseted in mentoring, if you can send me an email offline with your contact info and cost so I can swing it by the management that'd be great!

Before you go - one more point of clarification for me if you don't mind...

do I need to change the current structure of my tables or just make the new CallTableID dimension for rollup purposes?

Thanks again, I'll check for your email and post back!

Chris Campbell-415954 SSCrazy Points: 2248 More actions · Answer 8

No problem! It's a different way of thinking and takes some getting used to.

If you need to be able to drill-down along a pre-determined path (e.g. Call to Connect to Queue...), you will need a seperate dimension...probably snowflaked as you alluded to earlier or denormalized into a single table. Also, if there are other attributes about a Call that you want to report on, that points to a separate dimension. On the other hand, if the only thing you want to summarize by is Call ID, you can use the "fact" relationship type and build the dimension right off of your fact table. FYI...you can only have one "fact" relationship per measure group in SSAS.

Good luck!