June 27, 2017 at 9:21 am
Hi SSC,
I was discussing include columns with a co-worker because he had put primary key columns in his list of INCLUDE columns in a non-clustered index he built. I was going to suggest he remove them (because a non-clustered index already HAS to have a reference to the primary key), and perhaps he still should just for clarity, but I also looked at index stats and noticed that the two indices were identical in size and pages. the plus side, it would seem SQL is smart enough not to double count those columns.
My question is this: Other than the fact that it's unnecessary, is there anywhere else SQL might stumble or actually incur extra overhead because of these columns being in the INCLUDE list? Or is it just a matter of style?
FWIW, Here's the test I ran:
use test
go
-- Create a table
if object_id('Test.dbo.TestReindex') is not null drop table dbo.TestReindex
create table Test.dbo.TestReindex
(
RID int identity(1,1),
SomeGuid uniqueidentifier not null default newsequentialid(),
SomeString varchar(36),
SomeDate datetime2
constraint PKC__TestReindex__RID primary key clustered (RID, SomeGuid)
)
-- Load some fake data
insert into dbo.TestReindex (SomeString, SomeDate)
select cast(newid() as varchar(36)), dateadd(minute, checksum(newid()) % 50, sysdatetime())
from sys.all_objects a, sys.all_objects b
-- Create a NCI with the PK columns INCLUDEd
create nonclustered index IX__PKInclude on dbo.TestReindex (SomeString) include (RID, SomeGuid)
-- Create a NCI without PK columns INCLUDED
create nonclustered index IX__PKNoInclude on dbo.TestReindex (SomeString)
-- Check index size
select
IndexName = i.Name,
TableName = object_name(p.object_id),
IndexRows = p.rows,
DataCompression = data_compression_desc,
SizeKB = 8 * au.used_pages,
ps.in_row_data_page_count,
ps.in_row_used_page_count,
ps.in_row_reserved_page_count,
ps.lob_used_page_count,
ps.lob_reserved_page_count,
ps.row_overflow_used_page_count,
ps.row_overflow_reserved_page_count,
ps.used_page_count,
ps.reserved_page_count,
ps.row_count
from sys.partitions p
inner join sys.dm_db_partition_stats ps
on p.partition_id = ps.partition_id
and p.object_id = ps.object_id
inner join sys.indexes i
on p.object_id = i.object_id
and p.index_id = i.index_id
inner join sys.allocation_units au
on p.partition_id = au.container_id
where p.object_id = object_id('Test.dbo.TestReindex')
June 27, 2017 at 9:30 am
The index is using the columns that the clustered index is based on as the pointers to the row in the table. Since by default the primary key is also clustered, most of the times you'll see that it is using the columns that the primary key is based on.
As you wrote SQL Server is smart enough not to use those columns twice. If you include the columns that the clustered index is based on, it won't add them because it already has them in the index as the pointers to the real data. Personally if those columns are needed in the index as included columns, I will add them. I do it just to be sure that if in the future someone will change the clustered index, then those columns will still be in the included columns.
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
June 27, 2017 at 9:38 am
Adi Cohn-120898 - Tuesday, June 27, 2017 9:30 AM... Personally if those columns are needed in the index as included columns, I will add them. I do it just to be sure that if in the future someone will change the clustered index, then those columns will still be in the included columns.Adi
Interesting point; I had not thought about that.
June 27, 2017 at 11:20 am
Xedni - Tuesday, June 27, 2017 9:38 AMAdi Cohn-120898 - Tuesday, June 27, 2017 9:30 AM... Personally if those columns are needed in the index as included columns, I will add them. I do it just to be sure that if in the future someone will change the clustered index, then those columns will still be in the included columns.Adi
Interesting point; I had not thought about that.
Almost no one does. And almost no one really needs to. But every now and again someone gets blind-sided when some queries all of a sudden start going out to lunch because said columns are now "missing" from index(es).
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 27, 2017 at 12:02 pm
Actually that's not technically accurate. Non-clustered indexes will have all the key columns of the clustered index, but that does not have to be the PK. Indeed, best overall performance often demands that it not be. For example, you might cluster by ( datetime, location, ident ) or by ( location, datetime, ident ) but have ident alone as the PK.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 27, 2017 at 3:34 pm
Xedni - Tuesday, June 27, 2017 9:21 AMI was discussing include columns with a co-worker because he had put primary key columns in his list of INCLUDE columns in a non-clustered index he built. I was going to suggest he remove them (because a non-clustered index already HAS to have a reference to the primary key)
Actually that's not technically accurate. Non-clustered indexes will have all the key columns of the clustered index, but that does not have to be the PK. Indeed, best overall performance often demands that it not be. For example, you might cluster by ( datetime, location, ident ) or by ( location, datetime, ident ) but have ident alone as the PK.
You're right, I misspoke. Side effect of most PK's being clustered 🙂
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply