Best Plan for Clustered Index

  • I have a table that is the primary table in the database.

    I am trying to determine what the best choice for the primary clustered index would be.

    There is a bigint identity column - probably the obvious choice

    But virtually all of the selects are based on a Datetime column.

    Is there a way to determine if having a clustered index on the identity column and a non-clustered index on the datetime column would be more effective than having a clustered index on the datetime+identity column?

    This is a very high volume table, so I want to have the minimum impact on inserts while retaining the maximum speed of selects based on the datetime.

    If one was to select ordering on the identity column alone, the datetime columns would be close but not totally sequential (just to keep it interesting).

    Thanks,

    Bill Mell

  • Bill Mell (9/30/2008)


    There is a bigint identity column - probably the obvious choice

    But virtually all of the selects are based on a Datetime column.

    You mentioned this is a high volume table and said there are alot of INSERTs, but the decision here could depend more on the frequency of UPDATEs. I'm assuming the identity column you mentioned is the primary key, and that's what an UPDATE would be referencing in the WHERE clause to specify what record to update.

    If the vast majority of your SELECTs are based on that Datetime column, and there are few or no UPDATEs, it would actually be worthwile to have your clustered index be on the Datetime + Bigint columns. From my experience, when the criteria in your WHERE clauses matches the first column(s) in your clustered index then the resulting range scan works very quickly. If the clustered index was only on the Bigint column, the database engine would have to do a range scan on the nonclustered index on your Datetime column, then look up those records matched in the table through the clustered index so it takes longer to query.

    ANother thing to consider, is there another table that has a foreign key back to thie Bigint column in this table? If so, and your where clause criteria is on a column from that child table, then that situation would be better addressed by the Bigint being the clustered index column.

  • I agree with what Chris said for the most part. Based on what you said in your original post, I would put the clustered index on the datetime + bigint fields.

    😎

  • The question I would ask myself in order to answer your question, what is the most common access path? What will be used more often than anything else to query the table? Will the queries usually be against the datetime column or will the ID be used the most? Because the data is stored with the clustered index, it's usually best to put it close to the most commonly accessed path, thus eliminating, or at least reducing, key lookups.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks to all.

    Good points!

    The table has no updates, but periodically there are deletes, but the deletes are based on various criteria.

    The bigint column is not used as a reference to another table.

    I was somewhat concerned that there might be insert issues for the random datetimes that are not in sequence with the bigint identity columns.

    That scenario is certainly the exception rather than the rule, but it does happen.

    I wish I could think of a way to determine that.

    Thanks,

    Bill

  • Jeez. I read the question twice and I still don't notice this:

    But virtually all of the selects are based on a Datetime column.

    That should probably be the leading edge of the index. Including the Identity column would be question best left to testing. I'm not convinced it would help or hurt based on what you've said (now that I've read it three times).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Bill Mell (9/30/2008)


    I was somewhat concerned that there might be insert issues for the random datetimes that are not in sequence with the bigint identity columns.

    I can't think of any reason this would be a problem. Do you have a maintenance plan that sometimes or frequently REORGANIZEs or REBUILDs the index? I could see the B-Tree of the index getting lopsided, but this could happen if the Datetime or Bigint column was the first column in the clustered index.

  • I have a routine the regularly defragments the indexes.

    My understanding was that a clustered index represents the physical order of the records in the database, and that on inserts if a record was inserted that had a datetime that was less than the previous one, the other records would be moved to accommodate the new row.

    Bill

  • Records won't be exactly be moved to make space.

    If the row has to go onto a page and there isn't space on the page, the page is spit in half, half the rows stay on the current page, half go onto the new page and then the new page is linked into the table. This may cause fragmentation (where a page higher in the index key order has a pageID lower in the file)

    Providing this doesn't happen too often, and you have a regular index rebuild running, it isn't too much of a problem

    Is that datetime ever updated? If so, when it is the row will have to be moved, as its 'address' (clustering key) will have changed.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply