Is there any wrong to create clustered index on identity column?

  • Hi,

    I had seen a lot of tables were created with clustered index on identity columns. One told me it is not a good pratice but didn't tell me why. Is there any wrong with it? Thanks.

    Chris

  • It depends on the environment and the volume of inserts. A clustered index on an identity forces all inserts to be to pages at the "end" of the table.

    Is the table part of an OLTP system with a large number of inserts per second ?

  • Yes, it is for OLTP. But the volume is not huge (< 100 per second for sure). Is there a gotcha for a large volume environment?

  • With large volumes, a cluster on an increasing column can create an insert hotspot, but the volumn really needs to be excessive (1000s of inserts/sec)

    It's actually quite a good idea to put the cluster on an identity. It's a narrow column, which means your non-clustered indexes aren't wider than necessary and, since new values are always added at the end, the index doesn't become fragmented easily.

    That said, there are no hard and fast rules about where to put clustered indexes, it differes for every environment.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi,

    Thanks for tips. I thought hotspot was history after SQL 7.0. Maybe I was wrong.

    Chris

  • No, hotspot isn't history - it's a fact of life.

    GilaMonster is quite right to say that there are no hard and fast rules about where to put a clustered index. By putting the clustered index on an identity column (presumably your primary key?), you've arguably wasted the index on a column that will only ever return single values. Clustered indexes are best at returning large volumes of data, such as a range search on a date column, since Non-clustered indexes are useless at such tasks.

    I tend to prefer NON-clustered Primary Key constraints on identity columns, saving the clustered index for a more appropriate column.

    Having said that, I appreciate that this may lead to more random inserts into the table and possible page splits, but it needn't do. Depends on your data, fillfactors, frequency of index rebuilds etc.

    All I'm trying to say, really, is don't slavishly put clustered indexes on PK columns. It can often be more beneficial to use NC indexes and reserve your clustered for something else.

  • Hi Journeyman,

    Thanks a lot. Agree, save the clustered index key for non identity column if there is a need. But say I am so sure there is no need to reserve the key, it should not hurt the performance if I have clustered index on identity column, right?

    Chris

  • If you have no better place to put it, then put it there.

    Also note that the potential of hot spot has to be weighed against the probability of page spilts and fragmented indexes if the cluster is on something that's not ever-increasing and you have large numbers of inserts.

    Personally I like DateInserted for a cluster, if I have a column like that. Bonus if the column's going to be used for range searches.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • What you heard about it being history in 7.0 was probably the change that allowed SQL to adjust the granularity of locks as needed. While this did indeed help reduce hotspot problems, it didn't eliminate them completely in high transaction environments as noted above.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply