November 9, 2015 at 10:57 am
Sorry for the lame subject, but here goes.
I have a basic sales data warehouse. Basically, a salesperson earns a commission when a customer in his territory buys product. The catch is that the territory "borders" can get redrawn every year. So it would be necessary to "freeze" a definition of the territory map to get an accurate idea of what's going on in the business. (What customers in what territories are buying more/less etc).
The "customers" in my case are companies (hospitals and clinics), so they don't move like people do... How would I model this situation so that I could look at a single customer across time and at a territory with a fixed "definition" across time. (I suppose that would mean a membership that was fixed as of a point in time.... so something like "use the definition of the sales map that was valid on a given day").
It would seem that it would be something like:
Territory--(1,M)--Customer--(1,M)--SalesFact
So how would I specify when the "Customer in Territory" was valid?
Thanks!
Pieter
November 9, 2015 at 12:58 pm
Have you read about slowly changing dimensions type 2? It seems to be what you need.
November 9, 2015 at 1:05 pm
Yes, but I clearly don't understand it.
Basically, my understanding is that I would create a new Customer record (with a new Primary Key, but with the same Alternate Key)... it would get updated whenever the customer's sales territory changes. Maybe my problem is that I'm too accustomed to OLTP type databases as opposed to Data Warehouses...
Thanks,
Pieter
November 10, 2015 at 3:14 pm
In addition to what you have said above, your customer dimension should also contain effective/termination dates and preferable a "current record" indicator.
The effective/termination dates should be used when populating your fact table, to pick the correct version of your customer when the transaction occurred and using that customer record's surrogate key in the fact record.
Territory should not be its own dimension table, unless you have a really good reason to do so. It should rather be an attribute in the salesperson dimension. If that is done, a simple group by territory and customer would then give you the breakdown you need. (Also assuming that you have a date dimension that is referenced by your facts).
Edit: Corrected my statement about the territory attribute...it should be in the salesperson/employee dimension, as that is where you would track it. The fact record would tie territory to customer.
November 10, 2015 at 3:47 pm
I think SCD2 is what you need. As noted by Martin, you need start/end dates, whatever you call them. I'd also note that you might link salespeople to territories as an SCD 2 as well.
All your queries need to look at dates if they span time. They should look at the begin and end dates, with the end date of null meaning it's currently active.
Some links here: http://www.sqlservercentral.com/search/?q=slowly+changing+dimension&t=a&sort=relevance
November 10, 2015 at 4:14 pm
Steve Jones - SSC Editor (11/10/2015)
All your queries need to look at dates if they span time. They should look at the begin and end dates, with the end date of null meaning it's currently active.
Mr. Jones, although a matter of preference I have to respectfully disagree about the usage of null values for end dates. In my experience, we typically try to avoid null values as much as possible for any dimensional attribute. Using a future-dated default for current record end dates (like 12/31/9999) usually works pretty well.
November 11, 2015 at 8:50 am
I thought about a future date. Using 2999/12/31 or something makes sense as well.
I've done it both ways. I guess as I think about it, the end date as null or a fixed future date probably performs the same in terms of indexing and querying, so why not use a future date. I'll cease recommending NULL.
November 11, 2015 at 9:36 am
Steve Jones - SSC Editor (11/11/2015)
I thought about a future date. Using 2999/12/31 or something makes sense as well.I've done it both ways. I guess as I think about it, the end date as null or a fixed future date probably performs the same in terms of indexing and querying, so why not use a future date. I'll cease recommending NULL.
It wasn't a "cease and desist" comment, but I'll take it 🙂
The main problem with using NULL values for effective dates in dimensions, is the amount of effort it takes to work around the NULL values later on.
For instance, queries that load your fact tables and referencing a type-2 dimension will have to use the "between" operator in order to assign the correct dimension record as reference to the fact. If you have NULL values, you will have to use the ISNULL function in each one of those where clauses to deal with the NULL values... and that's a lot of additional work.
November 11, 2015 at 10:35 am
I agree with you. As I was writing the first time, I was thinking, what about the ISNULL code that's needed? Then I saw your comment, and I started to wonder why I'd ever done that. Maybe the disdain for a "magic" date?
I'll happily give the cease and desist. Learning and writing better code over time is what I want to promote here. Happy to follow my own path.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply