Clustered index on non-primary key columns

Question

Clustered index on non-primary key columns

SQL!$@w$0ME

SSChampion

Points: 12349
More actions
July 7, 2016 at 4:10 pm

#400377

Hello Folks,
Is it a good practice to create clustered index on non-primary key columns since the primary key column is not queried at all. Please share your thoughts or best practices on this.
Many thanks!

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

SQL!$@w$0ME (7/7/2016)
Hello Folks,
Is it a good practice to create clustered index on non-primary key columns since the primary key column is not queried at all. Please share your thoughts or best practices on this.
Many thanks!

There is ALL KINDS of decision inputs and logic that should be used to choose the indexes (ALL of them, not just the clustered one) on EVERY table. Well, at least if you want an optimally performing system.

There are many times I recommend clustered index on a non-PK column(s). There are also MANY times I recommend to not have the silly int identity 1,1 that everyone seems to put on every SQL Server table (which is then obviously the clustered PK)

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 2

I'm in agreement with Kevin.

Since the clustered index defines data storage, I recommend that the cluster be the most commonly used path to the storage. If that's the primary key, great. If it's some other column (or columns), let's use that. SQL Server defaults to putting the cluster on the primary key, but that is absolutely not the only choice. Pick the right key for the cluster.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Phil Parkin SSC Guru Points: 247227 More actions · Answer 3

Grant Fritchey (7/8/2016)
I'm in agreement with Kevin.
Since the clustered index defines data storage, I recommend that the cluster be the most commonly used path to the storage. If that's the primary key, great. If it's some other column (or columns), let's use that. SQL Server defaults to putting the cluster on the primary key, but that is absolutely not the only choice. Pick the right key for the cluster.

Would you not temper this advice at all, Grant? I would expect there to be circumstances where implementing according to this advice would result in poor INSERT performance.

SQL!$@w$0ME SSChampion Points: 12349 More actions · Answer 4

Does the clustered key column(s) has to be NOT NULL/UNIQUE as best practice?

SQL!$@w$0ME SSChampion Points: 12349 More actions · Answer 5

Phil Parkin (7/8/2016)
Grant Fritchey (7/8/2016)
I'm in agreement with Kevin.
Since the clustered index defines data storage, I recommend that the cluster be the most commonly used path to the storage. If that's the primary key, great. If it's some other column (or columns), let's use that. SQL Server defaults to putting the cluster on the primary key, but that is absolutely not the only choice. Pick the right key for the cluster
Would you not temper this advice at all, Grant? .
I would expect there to be circumstances where implementing according to this advice would result in poor INSERT performance.

Could you please advise on the situations which would result in poor INSERT performance.

Thanks!

vineet2507 SSC Rookie Points: 49 More actions · Answer 6

I think poor insert will not matter because of choice of coloumn selected but will depend upon unnecessary index and index fragmentation.

Phil Parkin SSC Guru Points: 247227 More actions · Answer 7

vineet2507 (7/8/2016)
I think poor insert will not matter because of choice of coloumn selected but will depend upon unnecessary index and index fragmentation.

High numbers of page splits caused as a result of INSERTs into a table with a poorly chosen clustered index require more time to process than INSERTs to a table with a narrow, unique, ever-increasing clustered index.

Phil Parkin SSC Guru Points: 247227 More actions · Answer 8

SQL!$@w$0ME (7/8/2016)
Does the clustered key column(s) has to be NOT NULL/UNIQUE as best practice?

Yes. Otherwise SQL Server has to create its own 'uniquifier', to make the rows unique. Read more about it here[/url].

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 9

Phil Parkin (7/8/2016)
SQL!$@w$0ME (7/8/2016)
Does the clustered key column(s) has to be NOT NULL/UNIQUE as best practice?
Yes. Otherwise SQL Server has to create its own 'uniquifier', to make the rows unique. Read more about it here[/url].

Here again, I wouldn't shy away from non-unique columns just because of this. Yes, absolutely, it adds to the overhead. Yes, that must be taken into account. No, it doesn't eliminate this as a choice.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 10

Phil Parkin (7/8/2016)
SQL!$@w$0ME (7/8/2016)
Does the clustered key column(s) has to be NOT NULL/UNIQUE as best practice?
Yes. Otherwise SQL Server has to create its own 'uniquifier', to make the rows unique. Read more about it here[/url].

To be honest, non-unique is the one of the guidelines that I'm happiest to break, as long as the clustered index is still fairly unique (not 4 values in a 100 million row table). The uniquifier's only a 4-byte int and it's null for the first occurrence of any value

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 11

Phil Parkin (7/8/2016)
Grant Fritchey (7/8/2016)
I'm in agreement with Kevin.
Since the clustered index defines data storage, I recommend that the cluster be the most commonly used path to the storage. If that's the primary key, great. If it's some other column (or columns), let's use that. SQL Server defaults to putting the cluster on the primary key, but that is absolutely not the only choice. Pick the right key for the cluster.
Would you not temper this advice at all, Grant? I would expect there to be circumstances where implementing according to this advice would result in poor INSERT performance.

There are all kinds of qualifiers. INSERT and the resulting page splits are one worry. UPDATE with the associated rearrangement of nonclustered indexes is another (plus the possibility of page splits, depending on the data type, the update, etc.). Compound keys could lead to deeper indexes, resulting in more reads. The list goes on and on. Every single choice in SQL Server is a dance around the various trade-offs of that choice. I'm just saying that sticking with the default is a choice, like any other, and it has a ton of implications too. Way too many databases have a clustered primary key on an identity column where the identity column is just there because identity and the cluster is there because primary key and the column is never used as filter criteria for a single query. Thought should be applied to all the choices we make.

By and large, since most systems are read heavy, using a simple rule of thumb, what gets me to the data the fastest on my reads, will be adequate for most of the design most of the time.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

SQL!$@w$0ME SSChampion Points: 12349 More actions · Answer 12

Any downside if the clustered key(non-primary key columns) is not unique and the columns are VARCHAR fields and NOT NULL?

Or is it better to create Primary key (identity key) as Clustered and create a non-clustered index for the other columns that are querying on?

Thanks!

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 13

SQL!$@w$0ME (7/8/2016)
Any downside if the clustered key(non-primary key columns) is not unique and the columns are VARCHAR fields and NOT NULL?
Or is it better to create Primary key (identity key) as Clustered and create a non-clustered index for the other columns that are querying on?
Thanks!

Yes. There's always a downside. That doesn't mean it's still not a better choice.

Let's deal with the good news first. NOT NULL is good.

Just saying VARCHAR doesn't tell me anything about whether this column is a good choice. First, question, and most important, is this the primary filter for the table? It needs to be the number one, most frequently used filter criteria for the WHERE clause, but, you must also take into account JOINs. Will a lot of other tables join to it? If so, the primary key might be the most common access path.

Next, how wide is the VARCHAR? VARCHAR(3), it's probably not going to be selective enough to make for a good key column, clustered or nonclustered. VARCHAR(3000), it's way too wide to make an efficient key.

How unique is the data? As Gail said, is it unique enough? Non-unique is not a showstopper, but only 75 unique values out of 50 million records is going to result in some pretty bad performance.

How often will the data be changed? If it's constantly getting updated, this can be a very poor choice as I said in the previous post.

Let's say that this is the primary filter and that it's a reasonable length (say VARCHAR(50) or less), and the data is pretty unique with minimal duplication (say, no more than 20-30 duplicates across a million rows), and it's fairly static data (like a city name, they can stay the same for hundreds of years, depending on the country and how often it gets invaded), then, yeah, it could be a great choice for the clustered key.

However, another question entirely, what does uniquely identify a row on this table, and don't say identity. There has to be a logical, business-oriented, unique identifier (there are rare exceptions, but exceptions should be just that, exceptional). This unique identifier may be the more common filter. Even if we choose to create an identity column (and there are great reasons for this, don't get me wrong), you'll still have to have a unique index on this identifier (one or more columns). Don't lose track of that one.

In short, without lots of understanding, there is no simple answer to your question.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Phil Parkin SSC Guru Points: 247227 More actions · Answer 14

GilaMonster (7/8/2016)
Phil Parkin (7/8/2016)
SQL!$@w$0ME (7/8/2016)
Does the clustered key column(s) has to be NOT NULL/UNIQUE as best practice?
Yes. Otherwise SQL Server has to create its own 'uniquifier', to make the rows unique. Read more about it here[/url].
To be honest, non-unique is the one of the guidelines that I'm happiest to break, as long as the clustered index is still fairly unique (not 4 values in a 100 million row table). The uniquifier's only a 4-byte int and it's null for the first occurrence of any value

Just to be clear: does that mean that you are suggesting that uniqueness in the clustered key column is not a best practice?