March 8, 2011 at 7:29 am
Hi there,
I am seeking a general advise on whether it makes sense (or not) to include the fields of a primary key on a table to be in the INCLUDED COLUMNS on a composite index. To the best of my knowledge this seems like "overhead" as the composite index implicitly is "build" upon the clustered index - in this case the primary key.
The reason I ask is that we have tables with duplicate address info (and tons of other fields) and we want to perform a selection returning all the person id's where the address occurs more than once across the table.
USE [tempdb]
GO
/****** Object: Table [dbo].[Leads] Script Date: 03/08/2011 15:15:43 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Leads]') AND type in (N'U'))
DROP TABLE [dbo].[Leads]
GO
USE [tempdb]
GO
/****** Object: Table [dbo].[Leads] Script Date: 03/08/2011 15:15:43 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Leads](
[LeadId] [bigint] IDENTITY(1,1) NOT NULL,
[Email] [nvarchar](50) NULL,
[Name] [nvarchar](60) NULL,
[Address] [nvarchar](100) NULL,
[ZipCode] [nvarchar](20) NULL,
CONSTRAINT [PK_Leads] PRIMARY KEY CLUSTERED
(
[LeadId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO
USE [tempdb]
/****** Object: Index [IX_Leads_Address] Script Date: 03/08/2011 15:15:43 ******/
CREATE NONCLUSTERED INDEX [IX_Leads_Address] ON [dbo].[Leads]
(
[Address] ASC,
[ZipCode] ASC
)
INCLUDE ( [LeadId]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
GO
INSERT INTO [tempdb].[dbo].[Leads] (
[Name]
,[Address]
,[ZipCode]
)
SELECT 'Peter', 'Cumbersome Road 101', 'DK-1111'
UNION
SELECT 'Liz', 'Cumbersome Road 101', 'DK-1111'
UNION
SELECT 'Mike', 'East Street 508', 'DK-1234'
GO
WITH dedup
AS
(
SELECT
[DedupKey] = ROW_NUMBER() OVER (
PARTITION BY
[Address]
,[ZipCode]
ORDER BY
[LeadId]
)
, *
FROM [dbo].[Leads]
)
SELECT
*
FROM dedup
WHERE
[DedupKey] > 1
The reason I ask is that according to general recommendations for composite indexes it is beneficial to have the non-key columns in the "SELECT..." clause added to the index as INCLUDED COLUMN to avoid having to scan the tables data pages. But does this also apply when returning the primary key values?
Regards,
Claus Thorning Madsen
March 8, 2011 at 7:42 am
Right, the PK is already included in every NCI in a table with a clustered index, so the include is not needed/desired.
See the est exec plan with the include and without and notice they are the same (no clustered index scan/seek, no bookmark, etc).
Jim
Jim Murphy
http://www.sqlwatchmen.com
@SQLMurph
March 8, 2011 at 7:42 am
Sure. Why would it not?
SQL's not stupid enough to add them twice if you explicitly specify them. Also, what happens if the clustered index is moved?
btw, it's the clustered index columns that are relevant, not the primary key columns. They are not the same.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 8, 2011 at 7:44 am
[Jim].[dba].[Murphy] (3/8/2011)
Right, the PK is already included in every NCI in a table with a clustered index, so the include is not needed/desired.
The clustered index key columns are included in every nonclustered index. Not the primary key columns. Primary key != clustered index.
Why is it not desired?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 8, 2011 at 7:54 am
Hi again,
I know that a Primary Key is not the same as a clustered index. But in my current scenario they are the same. That is: [LeadId] is a primary key with a clustered index upon it.
I know that if [LeadId] had not been the clustered index, then it would make a lot of sense to add it to my INCLUDED COLUMNS list for the composite index to avoid going for the data pages for the table to get the values for the field.
But when the field IS the clustered index I will now conclude that having the field in the INCLUDED COLUMNS list for the composite index only makes the index "wider" without adding any beneficial performance. It only makes my index maintenance "heavier". Correct?
Regards,
Claus
March 8, 2011 at 8:03 am
Gail,
Boy, I'm battin a million today. I misspoke and meant the CI value, not the PK value. I definitely know the difference.
I need to measure twice and cut once today. Er. Everyday. 🙂
Jim Murphy
http://www.sqlwatchmen.com
@SQLMurph
March 8, 2011 at 8:07 am
clausm73 (3/8/2011)
But when the field IS the clustered index I will now conclude that having the field in the INCLUDED COLUMNS list for the composite index only makes the index "wider" without adding any beneficial performance. It only makes my index maintenance "heavier". Correct?
Incorrect.
The column is already there implicitly. Explicitly specifying it will NOT result in it being in the index twice. SQL is not that stupid. It knows that the column it's implicitly adding is the same as the one you've specified and hence it will not add it a second time.
I recommend explicitly adding the clustered key column if it is needed. One for clarity, so that you don't have to look at two indexes to see if the NC index covers the query or not. The second for 'future-proofing' so that if someone in the future moves the clustered index the nonclustered does not suddenly become not covering for that important query
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 8, 2011 at 1:30 pm
Thank you both.
Now I finally understood it - and also learned that a more appropriate subject title would have been "Clustered Index fields as an Included Columns in a Composite Index".
Gail, I like your point ajour having the field explicitly added to the composite index. This will be my "Best Practise" recommendation to my colleagues going forward.
Thanks again for the clarifications.
Claus Thorning Madsen
March 8, 2011 at 1:53 pm
clausm73 (3/8/2011)
Gail, I like your point ajour having the field explicitly added to the composite index. This will be my "Best Practise" recommendation to my colleagues going forward.
I agree. I took note of this great tip as well. It makes two handfuls of sense. Thanks Gail!
Jim
Jim Murphy
http://www.sqlwatchmen.com
@SQLMurph
March 8, 2011 at 2:43 pm
GilaMonster (3/8/2011)
I recommend explicitly adding the clustered key column if it is needed. One for clarity, so that you don't have to look at two indexes to see if the NC index covers the query or not. The second for 'future-proofing' so that if someone in the future moves the clustered index the nonclustered does not suddenly become not covering for that important query
Thanks Gail, as usual. As much as I've batted around the NC/CI inclusion routine, I think I just realized why about 6/7 years ago I broke a client's system when I rebuilt a CI in a different way on a massive Fact table due to your comment. The system was too critical to allow me to test why at the time.
I believe I'll be following this practice going forward for NC indexes that require the CI involved... for the next poor soul.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply