July 20, 2015 at 7:57 pm
I'm trying to get my head around implementing a basic data warehouse schema for a reporting data warehouse. I had decided to use a star schema due to it being a commonly accepted (best?) practice DW schema. I've also been convinced that it's a good idea to have a date dimension (in-part due to Brent Ozar's "Why you simply must have a date table").
In the data warehouse I'm designing, the client/customer table has a client DOB field. My understanding is that if I want to store the weekday, quarter, year, monthday, etc. for the client DOB, and I want to use a star schema, I'd need to add those date calculations to the client; as by definition a star schema rules out a look-up from a dimension.
Am I understanding this issue correctly, that the only decent solutions are either to add the extra fields for the calculated date fields to the client table, or concede that a hybrid schema may work better for dimensions with this issue, allowing me to use that Date table at the cost of the extra join and break in adherence to the star schema?
July 20, 2015 at 8:19 pm
Why do you care about birthdays in your data warehouse? (might be a good reason, I don't know!)
What are you hoping to be able to query/get information about in your data warehouse? Is it sales?
Normally, you join a Date dimension table to your fact table so you can summarize sales (or whatever fact your fact table describes) in any way you want (Year, Quarter, Month, Day Of week,... etc). Really has nothing to do with birthdays.
July 20, 2015 at 9:17 pm
For some companies, for example those dealing in health, I'd need to know how old a client is on a certain date, to identify that I would need to store the date of birth, and to avoid having to get a separate look-up to the source system I'd need it in the data warehouse.
An example: how many clients under the age of five were given a certain medication in the month of August, 2014? I'd need to know which clients were given the medication between 1st August 2014 and 31st August 2015 and were under 5 on 31st August, 2014.
Is my thinking on this out-of-whack?
July 21, 2015 at 9:47 am
Andrew P (7/20/2015)
For some companies, for example those dealing in health, I'd need to know how old a client is on a certain date, to identify that I would need to store the date of birth, and to avoid having to get a separate look-up to the source system I'd need it in the data warehouse.An example: how many clients under the age of five were given a certain medication in the month of August, 2014? I'd need to know which clients were given the medication between 1st August 2014 and 31st August 2015 and were under 5 on 31st August, 2014.
Is my thinking on this out-of-whack?
This is a pretty common requirement in health-related systems. I would recommend that you add a calculated column to your client/patient dimension, which calculates the age dynamically when queried. Assuming that you have the client/patient key in your fact table, having the age attribute in the dimension would allow you to do the above.
Your last statement somewhat contradicts the very first post, in the sense that you probably wouldn't need to use the date of birth in an aggregated form (month, year, etc.). If you do, consider adding the date of birth's key to your fact table(s). That will allow you to aggregate and slice by the date of birth's month, year, etc.
Hope this helps.
July 29, 2015 at 5:17 am
I think you are a little bit confused with the topic.
You have to differentiate between dimension attributes and dimensions.
If you want to analyze for instance, how many patients have visited a clinic the last 6 month, then your fact table will have a column with the count of visits by the dimensions clinics and patients.
If you want to filter this analysis using the DOB of the patients, you just need a dimensional attribute in the patients dimension and then using your OLAP apply the correspondent filter.
If you need to calculate every day the exact age of a patient, you can implement a function in the view that feeds the dimension and then process it daily.
You can even discretize the patient dimension using buckets to categorized them in age ranges, i.e., under 25 year, from 25 to 35, older than 35, etc.
If the DOB is so important in your analysis, another option is create a separated dimension DOB and then during the loading of your fact table lookup the correspondent Id in the date dimension and book every transaction with the correspondent DBOId.
Hope that helps,
July 30, 2015 at 10:25 am
You could have just the DOB as a date in your client dimension and the join with the date dimension when required. If you join to the date dimension in building SSAS cube(s) then I believe this instance of the date dimension would be referred to as a referenced dimension.
Another option is to include a ClientDOB_key in your fact table that points to the proper record in your date dimension.
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply