Index on concatenated text or different columns

  • Hi,

    We're designing a time dimension table for use in datamarts.

    The hierarchy contains year, month, week and day.

    We use snowflake physical design.

    On the lowest level, date table, is it better to create a clustered index on a concatenated YYYYMMDD integer field or on separate YYYY, MM, DD fields?

    The mayority of queries will contain date ranges (week, quarter, month, year)

    Thx,

    Geert

  • Is there a reason you're not using DateTime?

  • Yes, we do.

    Our data is on daily basis, we do not need the time part in our queries.

    A Datetimefield is 8 bytes, whereas an integer field containing YYYYMMDD format is only 4 bytes long.

    I'm thinking of using a YYYYDDD format (year + number of day in year) as the PK.

    With a clustered index on a YYYYMMDD field (or on a YYYY field, a MM field and a DD field), performance should be adequate.

  • The time dimension table contains a Datetime field, but it's a mere attribute.

    I have no idea if an index is more performant on a 4 byte field than on an 8 byte field (of which 4 bytes are always zero).

    If there's almost no performance degradation, I think placing the Datetime field is the most straightforward strategy.

  • You might be able to use smalldatetime, which gives you January 1, 1900, through June 6, 2079. That is only four bytes as well.

  • Our calendar table has a generated key as the primary key. The records in our data warehouse then contain the date key that corresponds to the date. The rest of the columns in the table contain are just multiple representations of the date. This scenario allows for queried to search by multiple different date representations and still offer great performance. We have columns for text representations (for reporting) Julian date, fiscal period and year, holiday indicator, etc. Since there are only 365 records per year this table will only contain a few thousand records in the table and it will most likely be pulled in in its entirity into cache on the first read of that table so I doubt indexes on columns will due to much to improve performance.

  • Of course we're all so busy trying to get you to change your design, we're kind of ignoring your actual question:

    quote:


    is it better to create a clustered index on a concatenated YYYYMMDD integer field or on separate YYYY, MM, DD fields


    One index is better than three in this case. You're going after a date range, so you would never look at the month portion while disgarding the year. Therefore, a sperate index on just the month or the date would not be very useful.

  • Thanks!

    As the Smalldatetime field is as large as an integer with YYYYMMDD format, we'll go for the smalldatetime as the natural key.

    A surrogate key should be arbitrary, but we won't do this for the Date dimension.

    We'll stay with the INT value, YYYYMMDD.

    This allows a clustered index on date field of the fact tables to be efficient.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply