April 28, 2011 at 7:39 pm
I am trying to design a data warehouse for licensing vendor, who sells licenses on ecommerce and various other venues. The things they want to track are sales, product lifecycle and activity. What this means is that there are different sale types (such as new purchase, promotional purchase, renewal) and different events/states of a license, such as - a license can get installed, renewed, activated, registered. A license can get renewed many times (on different dates).
So I was thinking my dimensions would be very simple - date, product, source, saletype and event/state. I would have two fact tables; one would be for sales, and another would be for the events, both of them having foreign keys to the dimension tables. My fact tables would be an accumulating fact table, where every event would add a new row - hence, the licenses can be repeated. However, the requirements states that they be able to cross reference these two facts and the saletype and event dimensions. For example, If someone sees that product 'A' has 100 sales in the US ecommerce store of type 'new purchase', then they want to see how many of 'those' 100 licenses also got activated... and then maybe they would want to see, out of the people that activated, how many have registered... and then (back to saletype) how many of those that registered, how many of them 'renewed'. And I cannot really define a heirarchy, because you could have a whole lot of combinations of these....
How can I do this? As I'm reading, I find there seems to be no way to relate the two facts based on the license itself (which is what I need to do).
note: I am using sql server analysis services and sql server 2008 r2
April 28, 2011 at 8:22 pm
I may be misunderstanding the situation, but if the sale event is at the grain of 'count of licenses' but the action fact is at the license instance grain, then I don't think you'll be able to bring the event accuracy to the license. A completely non-related example could be - you sell apples, today you sold 100 apples to steve, you want to know which ones he ate. Unless you record some identifier for each of the 100 apples, then when he eats 30 of them, all you know is that 30 have been consumed. Where they the first 30? the last? some other grouping? you can't tell. It's the same with licensing if the license sales are tracked at an aggregate level rather than detail.
So, if you vendor distributes 100 unique identifiers to their clients, or perhaps even makes the clients register in some way (and this registration is captured on each event) then you can start to make relationships between the sold item and the used item. Otherwise, you're somewhat left having to report aggregated results when looking at sales _and_ events, and detail when looking at events only.
Steve.
April 28, 2011 at 8:49 pm
The grain for the sale event is not the count of licenses - it is also a license (or the sale itself). does that make sense? In the RDBMS, the tie between all of this is the license key.... I want to track both the events at the license level, if possible. So in your example, I 'would' like to track every apple... because they actually want to see each the lifecycle of each license....
May 1, 2011 at 10:00 pm
Ok, so I definitely misunderstood :hehe:
I'm not sure what the issue is that you're expecting? You can definitely use a License dimension to relate the two fact tables. Is the issue that you have no additional attributes to try and create hierarchy/ies above the individual license level?
Steve.
May 2, 2011 at 12:03 pm
I am not sure I agree with/understand your need for 2 fact tables. Renewed/renewal is listed in both. It seems to me that these are all events in the same theme and you are making things unnecessarily complicated by separating them. I would put everything in one transactional fact table with a record for each event (sale or licensing) that happens. Of course some of your dimensions would need to contain values for Unknown or N/A because not every event is going to relate to every dimension.
May 2, 2011 at 12:31 pm
stevefromOZ,
Thanks for your response. I have since decided to merge the two types of events into one (saletypes and events) as Daniel Stated - and they can heirarchical. So it solves one issue, which is that I don't really have to join between the two fact tables. But now I have a different problem (see below)
Daniel,
Thanks for your response - I have since decided to merge the events a while back - it saves me from doing the join. But now, the problem is that how do I make the measures that each of those events represent, as they still need to be related. For example, given I have a transactional fact table, I will have a row for each event. So for a license that was purchased in jan 2011, I will have one row. Then they may install (another event) - and then they may register (another event). When that row is up for expiration, I will have another row (with event ID for 'expired'). Then possibly, They user will be shown reminders (a whole bunch of events) or maybe even be sent an email (another event), and then at the end, actually renew (another event). How do you provide these numbers as measures now - given that I run a report to show me all the people who purchased in jan 2011 - and then I want to see how many of those people installed. And then how many of those registered, etc....
May 2, 2011 at 12:52 pm
distinct count, tied to the license ID?
Steve.
May 2, 2011 at 1:09 pm
Steve,
How do I do that? So each measure (such as Installed, Activated, Renewed, etc) would be a 'distinct count' - but how do you tie it to a license? Because given that I am querying by date, I will only pull up the records that I start with i.e. if I start with 'purchased', I will get 'x' amount of rows that purchased in january. Now if I show the 'Installed' measure, then I need that measure to show how of many of the 'x' purchased got installed. Is this possible?
May 2, 2011 at 3:42 pm
I was thinking more of the equivalent of "distinct count(License_ID)" ie a new, single measure that counts the distinct license instances. I can see that working when you're looking for the unique events (sales or otherwise) in a single period. If you also had an event count measure, then in a simple case, you could see maybe a unique_lic_count of 1 and event_count of say 4 (ie purchased, registered and so on). The analysis this may not work so well on, is if you had a distinct_count of say 20 and event count of 30, I'd have to build out a prototype to see how that would play out, as I'm thinking initially, it might simply tell you that "some" number of licenses had > 1 event tied to them in that period. Identifying the which ones, would prob require bringing the license dimension into the analysis also.
Steve.
May 3, 2011 at 10:40 am
How do you provide these numbers as measures now - given that I run a report to show me all the people who purchased in jan 2011 - and then I want to see how many of those people installed. And then how many of those registered, etc....
Well from a design point of view I would think you need an event types dimension that would elaborate on all the different types of events that can occur: purchase, install, register, etc.. I would also think you need a license dimension with all the requisite license properties such as License number, term, retail price, maintenance/renewal price, essential contract terms, etc..
Then you query your event transactions by license and event type constrained by a date range and I think you have what you want.
May 3, 2011 at 10:57 am
... with all the requisite license properties such as License number, term, retail price, maintenance/renewal price, essential contract terms, etc..
Would you not have the dollar items (price, cost) as a measure within the new fact? As noted above, I think you'd still need a count of some sort (whether you want distinct or not).
Steve.
May 3, 2011 at 1:08 pm
It seems to me that you have a fundamental disagreement between your requirements and your design. Based on what you are measuring it seems the fact should be an accumulating snapshot fact table. The way you are designing the fact is by keeping track of every state change for a license, making it a complete audit trail of every state that row was in. This might be useful at times if an accountant wants to know this information, but as a measure all it will get you is a count of audit rows, which isn't what you want. It is a fact that is not additive when counting the number of open licenses, unless you add a dimension such as Daniel suggests. The downside is that you will be forced to always filter by that dimension in every query or else you will get the wrong results.
The mistake you are making is that you are letting your transactional source database design drive the fact design. Instead your design should be autonomous[/url], independent of the source database.
Anytime you need to count an entity that has multiple state changes, then often an accumulating snapshot fact table is the answer. Here's a link to a Kimball article describing this type of fact table.
LinkedIn - http://www.linkedin.com/in/carlosbossy
Blog - http://www.carlosbossy.com
Follow me - @carlosbossy
May 3, 2011 at 4:08 pm
Would you not have the dollar items (price, cost) as a measure within the new fact? As noted above, I think you'd still need a count of some sort (whether you want distinct or not).
Yes, but....My thinking is that the transactional sale price, etc. would indeed be part of the fact table, but the retail price (like MSRP) for the specific software is not really additive but more dimensional property. Although upon further reflection I think that would more likely be part of a software dimension, although I could probably make a good case to make the license dimension a subset of columns of a software dimension. This is based on an assumption that there is more than one piece/title of software for sale.
Based on what you are measuring it seems the fact should be an accumulating snapshot fact table. The way you are designing the fact is by keeping track of every state change for a license, making it a complete audit trail of every state that row was in.
From a conceptual design point of view, an accumulating snapshot fact table would be the way to go. However there are many pitfalls.
--Accumulating snapshots are best used when the period across which data accumulates is reasonably brief. Here it seems like it could be years.
--The question of when to stop accumulating data crops up as well. A customer bought a license, then renewed it, what happens when they renew it again? Do you want to add multiple renewal columns to the fact table, or simply track the last renewal transaction? The wider your fact table gets, the worse it is to manage.
--Assuming this is all destined for a cube just about eliminates the practicality of an accumulating snapshot. How would you partition it? Every time you added new data to existing records you have to process the entire cube all over again. A potentially wide fact table, accumulating facts for months if not years would eventually run into processing time restrictions.
May 4, 2011 at 12:16 am
From a conceptual design point of view, an accumulating snapshot fact table would be the way to go. However there are many pitfalls.
--Accumulating snapshots are best used when the period across which data accumulates is reasonably brief. Here it seems like it could be years.
--The question of when to stop accumulating data crops up as well. A customer bought a license, then renewed it, what happens when they renew it again? Do you want to add multiple renewal columns to the fact table, or simply track the last renewal transaction? The wider your fact table gets, the worse it is to manage.
--Assuming this is all destined for a cube just about eliminates the practicality of an accumulating snapshot. How would you partition it? Every time you added new data to existing records you have to process the entire cube all over again. A potentially wide fact table, accumulating facts for months if not years would eventually run into processing time restrictions.
All excellent points. I believe this is where we would have to know the actual business requirements and the makeup of the source data better in order to make some of these decisions.
How many licenses are expected to be in this data mart? If the size is 'small' then a cube could be rebuilt completely, but if 'large' then an accumulating snapshot makes things more difficult. Does each renewal date need to be tracked? Is their a metric defined that requires us to report on the number of renewals each month, or the number of times a license has been renewed. This could be an aggregate column in the fact, or another fact altogether.
These decisions would be driven by detailed requirements that the architect needs to satisfy.
LinkedIn - http://www.linkedin.com/in/carlosbossy
Blog - http://www.carlosbossy.com
Follow me - @carlosbossy
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply