Clustered index on non-primary key columns

  • Hello Folks,

    Is it a good practice to create clustered index on non-primary key columns since the primary key column is not queried at all. Please share your thoughts or best practices on this.

    Many thanks!

  • SQL!$@w$0ME (7/7/2016)


    Hello Folks,

    Is it a good practice to create clustered index on non-primary key columns since the primary key column is not queried at all. Please share your thoughts or best practices on this.

    Many thanks!

    There is ALL KINDS of decision inputs and logic that should be used to choose the indexes (ALL of them, not just the clustered one) on EVERY table. Well, at least if you want an optimally performing system.

    There are many times I recommend clustered index on a non-PK column(s). There are also MANY times I recommend to not have the silly int identity 1,1 that everyone seems to put on every SQL Server table (which is then obviously the clustered PK)

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I'm in agreement with Kevin.

    Since the clustered index defines data storage, I recommend that the cluster be the most commonly used path to the storage. If that's the primary key, great. If it's some other column (or columns), let's use that. SQL Server defaults to putting the cluster on the primary key, but that is absolutely not the only choice. Pick the right key for the cluster.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (7/8/2016)


    I'm in agreement with Kevin.

    Since the clustered index defines data storage, I recommend that the cluster be the most commonly used path to the storage. If that's the primary key, great. If it's some other column (or columns), let's use that. SQL Server defaults to putting the cluster on the primary key, but that is absolutely not the only choice. Pick the right key for the cluster.

    Would you not temper this advice at all, Grant? I would expect there to be circumstances where implementing according to this advice would result in poor INSERT performance.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Does the clustered key column(s) has to be NOT NULL/UNIQUE as best practice?

  • Phil Parkin (7/8/2016)


    Grant Fritchey (7/8/2016)


    I'm in agreement with Kevin.

    Since the clustered index defines data storage, I recommend that the cluster be the most commonly used path to the storage. If that's the primary key, great. If it's some other column (or columns), let's use that. SQL Server defaults to putting the cluster on the primary key, but that is absolutely not the only choice. Pick the right key for the cluster

    Would you not temper this advice at all, Grant? .

    I would expect there to be circumstances where implementing according to this advice would result in poor INSERT performance.

    Could you please advise on the situations which would result in poor INSERT performance.

    Thanks!

  • I think poor insert will not matter because of choice of coloumn selected but will depend upon unnecessary index and index fragmentation.

  • vineet2507 (7/8/2016)


    I think poor insert will not matter because of choice of coloumn selected but will depend upon unnecessary index and index fragmentation.

    High numbers of page splits caused as a result of INSERTs into a table with a poorly chosen clustered index require more time to process than INSERTs to a table with a narrow, unique, ever-increasing clustered index.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • SQL!$@w$0ME (7/8/2016)


    Does the clustered key column(s) has to be NOT NULL/UNIQUE as best practice?

    Yes. Otherwise SQL Server has to create its own 'uniquifier', to make the rows unique. Read more about it here[/url].

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin (7/8/2016)


    SQL!$@w$0ME (7/8/2016)


    Does the clustered key column(s) has to be NOT NULL/UNIQUE as best practice?

    Yes. Otherwise SQL Server has to create its own 'uniquifier', to make the rows unique. Read more about it here[/url].

    Here again, I wouldn't shy away from non-unique columns just because of this. Yes, absolutely, it adds to the overhead. Yes, that must be taken into account. No, it doesn't eliminate this as a choice.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Phil Parkin (7/8/2016)


    SQL!$@w$0ME (7/8/2016)


    Does the clustered key column(s) has to be NOT NULL/UNIQUE as best practice?

    Yes. Otherwise SQL Server has to create its own 'uniquifier', to make the rows unique. Read more about it here[/url].

    To be honest, non-unique is the one of the guidelines that I'm happiest to break, as long as the clustered index is still fairly unique (not 4 values in a 100 million row table). The uniquifier's only a 4-byte int and it's null for the first occurrence of any value

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Phil Parkin (7/8/2016)


    Grant Fritchey (7/8/2016)


    I'm in agreement with Kevin.

    Since the clustered index defines data storage, I recommend that the cluster be the most commonly used path to the storage. If that's the primary key, great. If it's some other column (or columns), let's use that. SQL Server defaults to putting the cluster on the primary key, but that is absolutely not the only choice. Pick the right key for the cluster.

    Would you not temper this advice at all, Grant? I would expect there to be circumstances where implementing according to this advice would result in poor INSERT performance.

    There are all kinds of qualifiers. INSERT and the resulting page splits are one worry. UPDATE with the associated rearrangement of nonclustered indexes is another (plus the possibility of page splits, depending on the data type, the update, etc.). Compound keys could lead to deeper indexes, resulting in more reads. The list goes on and on. Every single choice in SQL Server is a dance around the various trade-offs of that choice. I'm just saying that sticking with the default is a choice, like any other, and it has a ton of implications too. Way too many databases have a clustered primary key on an identity column where the identity column is just there because identity and the cluster is there because primary key and the column is never used as filter criteria for a single query. Thought should be applied to all the choices we make.

    By and large, since most systems are read heavy, using a simple rule of thumb, what gets me to the data the fastest on my reads, will be adequate for most of the design most of the time.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Any downside if the clustered key(non-primary key columns) is not unique and the columns are VARCHAR fields and NOT NULL?

    Or is it better to create Primary key (identity key) as Clustered and create a non-clustered index for the other columns that are querying on?

    Thanks!

  • SQL!$@w$0ME (7/8/2016)


    Any downside if the clustered key(non-primary key columns) is not unique and the columns are VARCHAR fields and NOT NULL?

    Or is it better to create Primary key (identity key) as Clustered and create a non-clustered index for the other columns that are querying on?

    Thanks!

    Yes. There's always a downside. That doesn't mean it's still not a better choice.

    Let's deal with the good news first. NOT NULL is good.

    Just saying VARCHAR doesn't tell me anything about whether this column is a good choice. First, question, and most important, is this the primary filter for the table? It needs to be the number one, most frequently used filter criteria for the WHERE clause, but, you must also take into account JOINs. Will a lot of other tables join to it? If so, the primary key might be the most common access path.

    Next, how wide is the VARCHAR? VARCHAR(3), it's probably not going to be selective enough to make for a good key column, clustered or nonclustered. VARCHAR(3000), it's way too wide to make an efficient key.

    How unique is the data? As Gail said, is it unique enough? Non-unique is not a showstopper, but only 75 unique values out of 50 million records is going to result in some pretty bad performance.

    How often will the data be changed? If it's constantly getting updated, this can be a very poor choice as I said in the previous post.

    Let's say that this is the primary filter and that it's a reasonable length (say VARCHAR(50) or less), and the data is pretty unique with minimal duplication (say, no more than 20-30 duplicates across a million rows), and it's fairly static data (like a city name, they can stay the same for hundreds of years, depending on the country and how often it gets invaded), then, yeah, it could be a great choice for the clustered key.

    However, another question entirely, what does uniquely identify a row on this table, and don't say identity. There has to be a logical, business-oriented, unique identifier (there are rare exceptions, but exceptions should be just that, exceptional). This unique identifier may be the more common filter. Even if we choose to create an identity column (and there are great reasons for this, don't get me wrong), you'll still have to have a unique index on this identifier (one or more columns). Don't lose track of that one.

    In short, without lots of understanding, there is no simple answer to your question.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • GilaMonster (7/8/2016)


    Phil Parkin (7/8/2016)


    SQL!$@w$0ME (7/8/2016)


    Does the clustered key column(s) has to be NOT NULL/UNIQUE as best practice?

    Yes. Otherwise SQL Server has to create its own 'uniquifier', to make the rows unique. Read more about it here[/url].

    To be honest, non-unique is the one of the guidelines that I'm happiest to break, as long as the clustered index is still fairly unique (not 4 values in a 100 million row table). The uniquifier's only a 4-byte int and it's null for the first occurrence of any value

    Just to be clear: does that mean that you are suggesting that uniqueness in the clustered key column is not a best practice?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply