October 15, 2007 at 3:07 am
Hi All,
I have been asked to design a Kimball method star schema data warehouse datamart for my employer. I'm confident about doing this except one thing:
Most fact tables I have come across seem to join to a date dimension via a date_key. I understand the reason for this as it makes it very easy to join to the date dimension to pull off, say, all sales that occur on a Friday.
However, time has moved on and most databases seem to offer a full range of date functions that allow you to query dates easily.
Therefore, in summary, should I put date_keys on my fact tables or should I use the datetime datatype?
Thanks guys.
October 15, 2007 at 8:03 am
You will still want to build the Date Dimension. I think you will find this any of Kimballs books as well, as well as an explaination as to why.
😎
October 15, 2007 at 8:15 am
I have a Date dimension table in my data warehouse. The primary key is a surrogate key. So all the fact table is using that Date table.
October 15, 2007 at 11:57 am
There are many good reasons to have a date dimension table, so you might as well have one.
The following should give you a good start on creating a date dimension table. It has about every date attribute I could think of. It does not have fiscal year/fiscal month/fiscal week info, since that is usually specific to an organization.
Date Table Function F_TABLE_DATE
October 16, 2007 at 3:12 am
There are many situations when you can use built in functionality to determine the day of the week or the month of the year and so have no apparent need for a date dimension.
However, there may be situations where getting the right value is not straight forward and in these situations a date dimension is worth its weight in gold.
For example, in the UK the tax year runs from April to March so creating a time series becomes a bit more complicated - not impossible but I don't believe that there is an in built function to do that. In a date dimension you can create the appropriate columns and populate them and then everyone is using the same values when reporting - no one has to recreate the values each time they do a report. I know a company that has its company year running from August to July - it doesn't even fit into traditional quarters!
Another example, if you need to determine the week of the year (Sunday to Saturday) and the 1st of January is a Saturday does that count as week 1 of the current year or week 53 of the previous year? The business might decide that it is the latter. Putting the value onto a date dimension means that it will always be reported as the business wants it reported rather than relying on individuals remembering that they cannot use a built in WEEK function for a particular year because it does not give the correct answer.
At the end of the day, a date dimension gives you much more flexibility to deal with the unknowns - I know of a project that was running late and so they extended the first quarter so that it ran from January to April.
Jez
October 18, 2007 at 6:12 am
Hi All,
Your answers are unanimous for the date dimension. Thanks - especially to Jez for some good real world reasons. My DW will have a date dimension and my users will be advised why it is so good for them!
Cheers, Alan.
October 26, 2007 at 8:19 am
Hi,
I'm about to design a Time/Date Dimension (Day level) for a customer and I'm convinced of doing it the Kimball way is as close as perfect you could get. Roughly. I'm not convinced that the use of a surrogate key is the best choice. In Kimball's "The Data Warehouse Toolkit 2ed" he suggests that a surrogate key is preferred.
One reason is size. But a smalldatetime (yes, SQLServer) and an Int are both 4 bytes.
Another reason is how to handle Unknowns. With Surrogate keys you'd have the value -1 for unknowns but why not have the value '1900-01-01' for unknowns and '1995-01-01' for "Early dates", etc? You'd still have the recommended DATE_TYPE {'Normal','NA',..} field to separate the unknowns from the normal dates.
A reason to use the smalldatetime is that it's much easier as a developer to deal with in the development process (where you don't have to write joins all the time) and, in the end, saves time.
Another benefit of using a smalldatetime instead of a surrogate key is that you don't have to worry about getting the keys in the correct chronological order.
Now my question. Can you come up with any other reason why I should the surrogate key on my Time Dimension?
Note, I'm all for using Surrogate keys in all the other kind of dimensions.
Thanks
October 26, 2007 at 8:28 am
Date and time data types for representing date and time of day.
datetime
Date and time data from January 1, 1753 through December 31, 9999, to an accuracy of one three-hundredth of a second (equivalent to 3.33 milliseconds or 0.00333 seconds). Values are rounded to increments of .000, .003, or .007 seconds, as shown in the table.
Example Rounded example
01/01/98 23:59:59.999 1998-01-02 00:00:00.000
01/01/98 23:59:59.995,
01/01/98 23:59:59.996,
01/01/98 23:59:59.997, or
01/01/98 23:59:59.998 1998-01-01 23:59:59.997
01/01/98 23:59:59.992,
01/01/98 23:59:59.993,
01/01/98 23:59:59.994 1998-01-01 23:59:59.993
01/01/98 23:59:59.990 or
01/01/98 23:59:59.991 1998-01-01 23:59:59.990
Microsoft® SQL Server™ rejects all values it cannot recognize as dates between 1753 and 9999.
smalldatetime
Date and time data from January 1, 1900, through June 6, 2079, with accuracy to the minute. smalldatetime values with 29.998 seconds or lower are rounded down to the nearest minute; values with 29.999 seconds or higher are rounded up to the nearest minute.
--returns time as 12:35
SELECT CAST('2000-05-08 12:35:29.998' AS smalldatetime)
GO
--returns time as 12:36
SELECT CAST('2000-05-08 12:35:29.999' AS smalldatetime)
GO
**********************************************
Date and time data consists of valid date or time combinations. For example, valid date and time data includes both "4/01/98 12:15:00:00:00 PM" and "1:28:29:15:01 AM 8/17/98". Date and time data is stored using the datetime and smalldatetime data types in Microsoft® SQL Server™ 2000. Use datetime to store dates in the range from January 1, 1753 through December 31, 9999 (requires 8 bytes of storage per value). Use smalldatetime to store dates in the range from January 1, 1900 through June 6, 2079 (requires 4 bytes of storage per value).
October 29, 2007 at 7:16 am
In Kimball solutions I have seen, the surrogate date_key tends to only exist on the fact tables. On all the dimension tables they tend to be normal DATE datatypes. Dates on fact tables tend to always be known. If the fact table stores a fact START_DATE and a fact END_DATE where the END_DATE is unknown, then this is usually set to zero i.e. it points to the dummy row on the date_dimension where the DATE_KEY on the date_dimension is zero.
On all the dimensions, where dates are stored as DATE datatypes, unknown dates will simply be NULL.
I certainly would not want to join to the date_dimension for every single date in the schema. However, it is useful for joining important dates such as SALES_DATE - as aggregation of sales figures may be needed by week or month number. But then SALES_DATE is likely to be on the SALES_FACT_TABLE anyway!
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply