Changing dates to Integers

  • Patrick Seegers (6/9/2010)


    [font="Courier New"]Hello,

    I have found storing the date as an integer to be useful as an integer surrogate key to a date dimension table with attributes you can not easily get from tsql, for example business events tied to specific dates. (ex. a HolidayFlag or something descriptive text about the day) (Kimball Method as mentioned by others above)

    ...

    Are there any opposing views on this usage?

    Thank you

    [/font]

    There is no reason that you can't have a date dimension table with a date as a primary key. I am aware of Kimball’s arguments, but a date is such a well established entity that there is unlikely to be any major re-definition that would invalidate it. With the introduction of the DATE datatype in SQL Server 2008, the storage required is actually smaller than an integer. The only true advantage of an integer date id would be for cases where you have to support dates outside of the range that SQL Server supports (0001-01-01 through 9999-12-31 for SQL Server 2008 DATE). This is very unlikely for business oriented data warehouses.

    One thing I am definitely against is taking a datetime value and separating it into a date key and a time key, unless you also store the original datetime value. If you only have a date key and time key, then doing a query to get data within a datetime range ( Example: 2010-06-02 14:30:00.00 through 2010-07-13 15:15:00.000) is hard to write and very inefficient.

    Date Table Function F_TABLE_DATE on the following link was designed to allow you to have either an integer ID or a date as the primary key if you use it to load a date dimension table. It has at least four columns that could be used as a primary key to identify a date: DATE_ID, DATE, JULIAN_DATE, and MODIFIED_JULIAN_DATE

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=61519

  • the only true advantage of an integer date id would be

    I would argue that there is a much larger advantage but it is relevant only when the data is to be fed into SSAS. Having an integer key rather than a datetime data type key allows for simpler MDX scripts where these scripts need to refer to the internal unique id of the member. This is especially evident when workig with multiple date formats (e.g Aus versus US), where the determination of the current date results in a local rendition of the date format, which may/may not match the format of the key as it's held in the dimension.

    @jeff M - re: using NULL for the date - where were you suggesting that this reside? In the fact and the dimension? If it's the dimension then I see little point as a 'standard' query written to join the fact to the dimension using FK/PK fields int he join clause will happily exclude all of my unknown rows (ie NULL != NULL). If it's in the dimension only, then I now loose all of the 'business intelligence' added to my time dimension - that is, if my query returns attributes of time such as DOW or IsHoliday etc, then either I have to change my query to handle the existence of nulls or I simply miss reporting on that data again. In there instances, having a 'valid' unknown item (if using integer keys, then sure, go for -1; if using dates, sure make it 9999-12-31) in the dimension definitely helps. My POV is that using an integer, with a -1, allows for a truly surrogate key (versus meaningful key) but as you pointed out, that's just one POV out of many thousands in which I'm sure many would find arguments against this approach 🙂

    Steve.

Viewing 2 posts - 16 through 16 (of 16 total)

You must be logged in to reply to this topic. Login to reply