Clustered index on DATETIME: hidden problems?

  • I have a event logging table (planned to receive millions of rows) that is clustered on a DATETIME column. The value of the column is set via a default using GETDATE().

    I am concerned about the server running for a long period of time, correctly processing events, and then having its time set back by accident (i.e. from 1/1/2010 9:00AM to 1/1/2010 8:00AM).

    If this happened:

    1) The processing time of the new events would be incorrect

    2) Because of the clustered index, the new events would be physically stored in the wrong order

    Is this an inherent design flaw in using a GETDATE() value in a clustered index? Is there an established method to prevent this type of problem?

  • I've never had the system time reset on a server before. In most enterprise systems I've worked, we have a system clock that sets the time for the servers. Of all the problems, I'd hope this one wouldn't be that big a deal.

    However, assuming that the time changed and you had a few more records at 9AM that are supposed to be 10AM or 8AM, except for the concerns surrounding the data and any impact bad data has on the business, it won't seriously negatively impact the clustered index.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • As Grant said it is highly unlikely that the system time on a server be reset. I'd be more concerned with the change from DST back to Standard Time. One way to offset that is to use a UTC Date instead of the local date. So you'd use GETUTCDATE() instead of GetDate(). As long as you document this anything accessing this data could account for the time change via a Daylight Savings Time table.

  • Is this a DST thing?

    That's tricky in and of itself. If I sell a widget (or record some event) at 8:05am, and then we have a DST rest at 8:59:59 going to 8:00:00, and record a later sale at 8:01am, which one occurred first?

    Obviously in real human time we know that the 8:05 sale occurred before the 8:01 sale. However we've artificially repeated an hour. A week later, we might not realize ourselves which one was first. For auditing, how should it be handled?

    In terms of the server, you might end up with some page splitting and insertions in the middle of other rows for that hour, but I wouldn't think that this happening once a year for an hour that it would be a huge performance issue. A larger fillfactor might even negate any issues.

    If it happened regularly, I'd think you have other issues far beyond what the CI is on this table.

  • To everyone, thanks for all your input.

    This is my first experience implementing an enterprise system and so I didn't even know that things like system clocks exist (although it seems obvious when I think about it). We are logging on UTC time so DST is not a concern.

    The events will be loaded into the system through various outside feeds and logged with an entry date into the system. Users will be periodically polling the server for the most recent events. So a user might poll the system at 8:50am, 8:55am, 9:00am, etc. If, at 9am, the system time was set back an hour, the user would be polling for new events that are now being logged at 8am. It would take an hour for the user to start seeing events again, and in the meanwhile, he would have missed an hour's worth of data.

    Since the loading process runs in the background, there is no way to raise an error to a specific user that the time has changed. I would assume we would need someone monitoring the process, or have a process that checks the tables for dates that are suddenly out of order? The other thought is, during the insert, to check if the new date is less than the maximum date in the table and then raise some kind of flag. But that seems like a potential performance hit if the table gets big enough.

  • Same solution, use utc date both in the CI and in the where condition (UTCDATE - 5 minutes). And you're safe.

  • ...If, at 9am, the system time was set back an hour, the user would be polling for new events that are now being logged at 8am...

    The system time should not change like that, as long at Windows was configured to keep it's time synced to an accurate time source.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply