Date or int data type for clustered index ?

  • We have a data warehouse fact table which has integer data type for its date dimention, like 20100621, 20100622,....

    Eventually this table will have tens of billions records (currently it is just tens of millions). We will create a clustered index on it and partition by date.

    From performance standpoint, what would make more sence, to keep it as integer or to convert it to date data-type?

    Any pros and contra ?

    Thanks

  • I would think not converting it to date format would simply mean more work in other areas. I would assume you already deal with these things but it is much more work to query a date that is not stored in a date format. I would think converting it to date would simplify your life and I am not sure what the argument would be to store a date as an integer.

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • To deal with date, we have another table which is DimDate dimention table. For most of the queries we join fact to dim anyway. However, PK for dim table is also int.

  • My experience has been that, if it's a date, you should store it as a date. Otherwise, you guarantee additional work to convert it every time you want to actually use it.


    And then again, I might be wrong ...
    David Webb

  • I would think having to join to a Dim table to query date would take more resources then storing the date as a date.

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • It is known practice to use INT datatype to store dates especially in large datawarehouses... BUT!

    Not in the YYYYMMDD format!

    As in order to convert it back tp date you will need to convert to varchar first!

    I would suggest to keep it as INT, but convert it into serialized date number. In you case try:

    SELECT CAST(CAST(CAST(DateColumn as VARCHAR(8)) AS DATETIME) AS INT)

    You will get the number of days since 01 jan 1900.

    Why you do so:

    1. Int takes just 4 bytes instead of 8

    2. Range filters faster

    3. As serialized int value implicitly converts back to the datetime, you can use datetime functions directly on INT values. (select DATEADD(dd,1,40327))

    4. To convert it back to datetime single cast required (implicit converion is allowed)

    So, you got performance benefits and less required storage.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (6/25/2010)


    It is known practice to use INT datatype to store dates especially in large datawarehouses... BUT!

    Not in the YYYYMMDD format!

    As in order to convert it back tp date you will need to convert to varchar first!

    I would suggest to keep it as INT, but convert it into serialized date number. In you case try:

    SELECT CAST(CAST(CAST(DateColumn as VARCHAR(8)) AS DATETIME) AS INT)

    You will get the number of days since 01 jan 1900.

    Why you do so:

    1. Int takes just 4 bytes instead of 8

    2. Range filters faster

    3. As serialized int value implicitly converts back to the datetime, you can use datetime functions directly on INT values. (select DATEADD(dd,1,40327))

    4. To convert it back to datetime single cast required (implicit converion is allowed)

    So, you got performance benefits and less required storage.

    In SQL 2008 there is no storage benefit to storing a date as an integer, since the DATE datatype is 3 bytes, and the INT datatype is 4 bytes.

    I doubt the statement that an integer range will filter faster than a 2008 DATE without seeing some supporting tests.

  • You are right for SQL2008 the way to go is to use new DATE datatype to store a date.

    (My mistake, I was still in SQL2005 forum mode...)

    Interesting enough that the DATE type internally is stored as a 3 byte (24-bit) integer. However to convert from INT to DATE you can only via BINARY.

    Also, the following which used to work with DATETIME will not work with DATE:

    SELECT DateValue + 1

    SELECT DateValue1 + DateValue2

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Thanks, Michael, this is what I wanted to find out.

    I wanted to emphasize in my initial post that I am talking abount DATE (not datetime or smalldatetime) data-type, but at the last moment decided no to because this is 2008 forum.

  • Even though DATETIME saves the space in comparison to INT, this should not be the only reason to save the data in DATETIME and not in INT. I would focus more on how often the data is fetched by joining another table. If this is really done more often, the date should be stored as INT to avoid use of many resources while comaring the fields in case the date is stored in DATETIME.

  • Yes, it will be joined often with dimention table. But I don't see why INT will use less resources than DATE ? As Michael noted above, INT will take 4 bytes while DATE takes 3 bytes.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply