Cluster index in identity column or actual value which is uniq. Pls guide

  • Hello,

    Is it a good practice to have clustered index on a identity column or a column which is having a unique field like the userid.

    is it advisable to have identity columns in all the tables and in all the tables the identity is set as a clustered index. please advise as we are creating data model for all the tables

     

     

  • Identity columns are really good as primary keys. They are small, and keeping an index small results in faster searches. However, whether identity columns should be part of the clustered index depends on your queries. If you are selecting based on the primary key field, it makes sense to make it part of the clustered index. If you are selecting primarily by a data field that does not change, making it clustered may be a better solution. Also, if you are sorting frequently on a column, it may be worth setting up clustering on this column. Do you know what kind of queries you will be running? How frequently?

    Regards,

    Andras


    Andras Belokosztolszki, MCPD, PhD
    GoldenGate Software

  • in our data model the identity columns are not used for searching just for consistency purpose, we have created identity columns. we are not going to use this field  for where condition or select condition also.

    so for what i know is the queries will not be based on the identity columns.

    Thanks

     

     

  • clustered indexes are more suited to range-seek (like date ranges). (benefit of having a sequential IO feed because the data is ordered sequential). If you only fetch 1 item at a time the benefit of having them layed out sequential is lost)

  • Based on this you would want your clusered index on another column or columns and if you are using the identity column in joins you may want a non-clustered index on it.

  • 1) There is a benefit to having a clustered index on an identity, even for single row lookups: you don't have to do a bookmark lookup to get to the data because you are already there.

    2) Moving the clustered index to another column (or even worse, columns) leads to the following:

    a) if not unique, you get a 4 byte "uniquefier" under the covers.

    b) even if (a) doesn't apply, all bytes of a clustered index are carried along as the pointers on EVERY non-clustered index. Result can be a very fat, inefficient index strategy.

    c) if clustered index is on identity, IIRC the storage engine is smart enough to not split the tail page 50/50 each time it fills up. It just creates a new page since by definition all values in identity will be increasing and in order. You don't get the benefit of this if you use a char or date column.

    3) Indexing should be applied to a table based on data distribution, column characteristics and most importantly your application's data access patterns.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I assume that the student id number is ubiquitous in your environment. Each student quickly learns his own and the teacher's class roster has them listed. Whenever any of them wants to find information, he has the student number ready. Your application must take the student number and access the database to find the identity value associated with it in order to then access the database a second time to find whatever information they are seeking. But you already have all the data you need -- the student number. Why add an extra layer of access if you don't need to?

    I cringe whenever I hear someone say they create identity columns on every table as a matter of policy or for the sake of "consistency." If a table design is improved by an identity field, add it. But if it already has a perfectly usable natural key, why add another? For one thing it creates more work both in the design and implementation. For example, unless you make an extra effort, it will consume the one clustered index you get for the table. You also have to take the effort to add a unique index on the natural key field(s).

    This practice (identity field on every table) can lead to ridiculous extremes. Creating an identity field for an intersection table is a study in futility. I have seen them, even in learned texts. But I have yet to see any select statement where the field is used. The field is there but, not only is it not used, there is no way to actually make any practical use of it. Try it.

    There is also the temptation to get a little lazy when modeling your data. Every table will automatically get a unique identifier, so if you get a little sloppy in properly identifying the natural key of the entity, no matter. Whatever you try to insert into the table is guaranteed to be unique. This can be a blessing or a curse.

    The point is, there is a tendency to think that if a little is good, more should be better. But remember that the definition of a toxin is not "what" but "how much." Too much of almost anything can be bad for you.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply