INCLUDE columns in nonclustered index which are in PK

Question

INCLUDE columns in nonclustered index which are in PK

Xedni

SSCertifiable

Points: 6951
More actions
June 27, 2017 at 9:21 am

#402219

Hi SSC,
I was discussing include columns with a co-worker because he had put primary key columns in his list of INCLUDE columns in a non-clustered index he built. I was going to suggest he remove them (because a non-clustered index already HAS to have a reference to the primary key), and perhaps he still should just for clarity, but I also looked at index stats and noticed that the two indices were identical in size and pages. the plus side, it would seem SQL is smart enough not to double count those columns.
My question is this: Other than the fact that it's unnecessary, is there anywhere else SQL might stumble or actually incur extra overhead because of these columns being in the INCLUDE list? Or is it just a matter of style?
FWIW, Here's the test I ran:
use test go
-- Create a table if object_id('Test.dbo.TestReindex') is not null drop table dbo.TestReindex create table Test.dbo.TestReindex ( RID int identity(1,1), SomeGuid uniqueidentifier not null default newsequentialid(), SomeString varchar(36), SomeDate datetime2 constraint PKC__TestReindex__RID primary key clustered (RID, SomeGuid) )
-- Load some fake data insert into dbo.TestReindex (SomeString, SomeDate) select cast(newid() as varchar(36)), dateadd(minute, checksum(newid()) % 50, sysdatetime()) from sys.all_objects a, sys.all_objects b
-- Create a NCI with the PK columns INCLUDEd create nonclustered index IX__PKInclude on dbo.TestReindex (SomeString) include (RID, SomeGuid)
-- Create a NCI without PK columns INCLUDED create nonclustered index IX__PKNoInclude on dbo.TestReindex (SomeString)
-- Check index size select IndexName = i.Name, TableName = object_name(p.object_id), IndexRows = p.rows, DataCompression = data_compression_desc, SizeKB = 8 * au.used_pages, ps.in_row_data_page_count, ps.in_row_used_page_count, ps.in_row_reserved_page_count, ps.lob_used_page_count, ps.lob_reserved_page_count, ps.row_overflow_used_page_count, ps.row_overflow_reserved_page_count, ps.used_page_count, ps.reserved_page_count, ps.row_count from sys.partitions p inner join sys.dm_db_partition_stats ps on p.partition_id = ps.partition_id and p.object_id = ps.object_id inner join sys.indexes i on p.object_id = i.object_id and p.index_id = i.index_id inner join sys.allocation_units au on p.partition_id = au.container_id where p.object_id = object_id('Test.dbo.TestReindex')
Executive Junior Cowboy Developer, Esq.[/url]

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 1

The index is using the columns that the clustered index is based on as the pointers to the row in the table. Since by default the primary key is also clustered, most of the times you'll see that it is using the columns that the primary key is based on.
As you wrote SQL Server is smart enough not to use those columns twice. If you include the columns that the clustered index is based on, it won't add them because it already has them in the index as the pointers to the real data. Personally if those columns are needed in the index as included columns, I will add them. I do it just to be sure that if in the future someone will change the clustered index, then those columns will still be in the included columns.

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Xedni SSCertifiable Points: 6951 More actions · Answer 2

Adi Cohn-120898 - Tuesday, June 27, 2017 9:30 AM
... Personally if those columns are needed in the index as included columns, I will add them. I do it just to be sure that if in the future someone will change the clustered index, then those columns will still be in the included columns.
Adi

Interesting point; I had not thought about that.

Executive Junior Cowboy Developer, Esq.[/url]

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

Xedni - Tuesday, June 27, 2017 9:38 AM
Adi Cohn-120898 - Tuesday, June 27, 2017 9:30 AM
... Personally if those columns are needed in the index as included columns, I will add them. I do it just to be sure that if in the future someone will change the clustered index, then those columns will still be in the included columns.
Adi
Interesting point; I had not thought about that.

Almost no one does. And almost no one really needs to. But every now and again someone gets blind-sided when some queries all of a sudden start going out to lunch because said columns are now "missing" from index(es).

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

ScottPletcher SSC Guru Points: 101118 More actions · Answer 4

Xedni - Tuesday, June 27, 2017 9:21 AM

I was discussing include columns with a co-worker because he had put primary key columns in his list of INCLUDE columns in a non-clustered index he built. I was going to suggest he remove them (because a non-clustered index already HAS to have a reference to the primary key)

Actually that's not technically accurate. Non-clustered indexes will have all the key columns of the clustered index, but that does not have to be the PK. Indeed, best overall performance often demands that it not be. For example, you might cluster by ( datetime, location, ident ) or by ( location, datetime, ident ) but have ident alone as the PK.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Xedni SSCertifiable Points: 6951 More actions · Answer 5

ScottPletcher - Tuesday, June 27, 2017 12:02 PM

Xedni - Tuesday, June 27, 2017 9:21 AM
I was discussing include columns with a co-worker because he had put primary key columns in his list of INCLUDE columns in a non-clustered index he built. I was going to suggest he remove them (because a non-clustered index already HAS to have a reference to the primary key)

Actually that's not technically accurate. Non-clustered indexes will have all the key columns of the clustered index, but that does not have to be the PK. Indeed, best overall performance often demands that it not be. For example, you might cluster by ( datetime, location, ident ) or by ( location, datetime, ident ) but have ident alone as the PK.

You're right, I misspoke. Side effect of most PK's being clustered 🙂

Executive Junior Cowboy Developer, Esq.[/url]