April 16, 2015 at 9:01 am
All
I am fighting to get a PK change from Non-Clustered to Clustered, with more vigorous testing than is currently being suggested. It is the main table of the db. As far as I am aware and to my knowledge changing from PK NON-C to PK Clustered is correct to do.
But if you are not prepared to fully test all scripts which run against the table it can cause issues, as it will now force SQL optimiser to work differently on the table and calculate different paths to get to the data. Some queries may improve some may crash the application due to poor performance.
Am I correct. If so do you have a link to an article or link that will back me up.
Thanks
April 16, 2015 at 9:10 am
There are many, many times when the PK should not be the clustered index. Why do you think it should be changed??
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 16, 2015 at 10:19 am
Not me 3rd Party. The table is a Heap it is badly designed with far too many columns and serving many different purposes.
They believe it will improve performance. My view is not without extensive testing.
As it is a major change to the table.
April 16, 2015 at 10:24 am
Talib123 (4/16/2015)
Not me 3rd Party. The table is a Heap it is badly designed with far too many columns and serving many different purposes.They believe it will improve performance. My view is not without extensive testing.
As it is a major change to the table.
if the table is a heap, it should be changed to have a clustered index.
best practice is every table should have a clustered index, unless you have a specific , compelling reason for it not to.
there are lots of reasons why heaps can be less than optimal. changing the PK to be clustered where no clustered index existed before is a no-brainer for me.
the statement "Some queries may improve some may crash the application due to poor performance." i think is unsubstantiated. i've never, ever seen an crash due to adding a clustered index.
Lowell
April 16, 2015 at 11:04 am
Lowell (4/16/2015)
Talib123 (4/16/2015)
Not me 3rd Party. The table is a Heap it is badly designed with far too many columns and serving many different purposes.They believe it will improve performance. My view is not without extensive testing.
As it is a major change to the table.
if the table is a heap, it should be changed to have a clustered index.
best practice is every table should have a clustered index, unless you have a specific , compelling reason for it not to.
there are lots of reasons why heaps can be less than optimal. changing the PK to be clustered where no clustered index existed before is a no-brainer for me.
the statement "Some queries may improve some may crash the application due to poor performance." i think is unsubstantiated. i've never, ever seen an crash due to adding a clustered index.
Yes, it needs a clustered index, but not necessarily the same as the PK. Maybe the PK is a guid. Maybe the table is always searched by a different value.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 16, 2015 at 12:21 pm
Hi All
I did state in my original post that I agree it should have a clustered index.
My question is that changing a table which has multiple indexes and has many varying statements being executed against it, should have a very high degree of testing applied as creating a Clustered index can have a negative impact for some queries and a positive impact on others. This table in particular will have hundreds of different queries running against it.
My question is of degree of testing. Yes I have seen a change of PK to clustered on a table completely bring an application down.
April 16, 2015 at 12:32 pm
Talib123 (4/16/2015)
As far as I am aware and to my knowledge changing from PK NON-C to PK Clustered is correct to do.Thanks
The right answer is, it depends. To mention a few, it depends of:
-Access pattern in your table (reads, writes, inserts, updates)
-Your table's schema
-The actual data type(s) of the keys that are part of your PK.
-If your PK is sequential or not
I just recently changed a PK from PK/CI to a simply UC and fixed a horrible fragmentation issue. I created a surrogate PK and made it a CI. But I performed a 4 weeks testing, talked to app. team and developers, and was logging Index access patterns for months, even before I saw the issue.
If you post your table's schema we may be able to help or provide a more precise answer. But I would suggest to capture Index utilization for several weeks so you can understand how the table is being used. In fact, running Extended Event for a few min (or when the table is being accessed) may be a good idea too.
But one thing is true... if the table is a heap, it certainly need a CI.
April 16, 2015 at 1:07 pm
You will need to look at least at the stats SQL provides for indexes:
1) missing index stats and
2) index usage stats.
Between those and you knowledge of the table, we can come up with a decent clustered index recommendation.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 16, 2015 at 1:08 pm
For example:
--USE [<your_db_name_here>] --naturally make sure you are in the right db
SET DEADLOCK_PRIORITY LOW --probably irrelevant, but just in case
DECLARE @list_missing_indexes bit
DECLARE @table_name_pattern sysname
--NOTE: showing missing indexes can take some time; set to 0 if you don't want to wait.
SET @list_missing_indexes = 1
SET @table_name_pattern = '??' --<<-- !table name goes here!
--SET @table_name_pattern = '%'
PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)
IF @list_missing_indexes = 1
BEGIN
SELECT
GETDATE() AS capture_date,
DB_NAME(mid.database_id) AS Db_Name,
dps.row_count,
OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,
mid.equality_columns, mid.inequality_columns, mid.included_columns,
user_seeks, user_scans, ca1.max_days_active, unique_compiles,
last_user_seek, last_user_scan, avg_total_user_cost, avg_user_impact,
system_seeks, system_scans, last_system_seek, last_system_scan, avg_total_system_cost, avg_system_impact,
mid.statement, mid.object_id, mid.index_handle
FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)
CROSS APPLY (
SELECT DATEDIFF(DAY, create_date, GETDATE()) AS max_days_active FROM sys.databases WHERE name = 'tempdb'
) AS ca1
LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON
mig.index_handle = mid.index_handle
LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON
migs.group_handle = mig.index_group_handle
LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON
dps.object_id = mid.object_id AND
dps.index_id IN (0, 1)
--order by
--DB_NAME, Table_Name, equality_columns
WHERE
1 = 1
AND mid.database_id = DB_ID() --only current db
AND OBJECT_NAME(mid.object_id) LIKE @table_name_pattern
--AND mid.object_id IN (OBJECT_ID('<table_name_1>'), OBJECT_ID('<table_name_2>'))
ORDER BY
--avg_total_user_cost * (user_seeks + user_scans) DESC,
Db_Name, Table_Name, equality_columns, inequality_columns
END --IF
PRINT 'Midpoint @ ' + CONVERT(varchar(30), GETDATE(), 120)
-- list index usage stats (seeks, scans, etc.)
SELECT
ius2.row_num, DB_NAME() AS db_name,
CASE WHEN i.name LIKE ca2.table_name + '%'
THEN '~' + SUBSTRING(i.name, LEN(ca2.table_name) + 1, 200)
ELSE i.name END AS index_name,
CASE WHEN i.is_unique = 0 THEN 'N' ELSE 'Y' END + '.' +
CASE WHEN i.is_primary_key = 0 AND i.is_unique_constraint = 0 THEN 'N' ELSE 'Y' END AS [uniq?],
ca2.table_name,
i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,
dps.row_count,
SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,
ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,
ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,
fk.Reference_Count AS fk_ref_count,
DATEDIFF(DAY, CASE WHEN o.create_date > ca1.sql_startup_date THEN o.create_date
ELSE ca1.sql_startup_date END, GETDATE()) AS max_days_active,
FILEGROUP_NAME(i.data_space_id) AS filegroup_name,
ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,
ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update
FROM sys.indexes i WITH (NOLOCK)
INNER JOIN sys.objects o WITH (NOLOCK) ON
o.object_id = i.object_id
CROSS JOIN (
SELECT create_date AS sql_startup_date FROM sys.databases WHERE name = 'tempdb'
) AS ca1
CROSS APPLY (
SELECT OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name
) AS ca2
OUTER APPLY (
SELECT
', ' + COL_NAME(object_id, ic.column_id)
FROM sys.index_columns ic
WHERE
ic.key_ordinal > 0 AND
ic.object_id = i.object_id AND
ic.index_id = i.index_id
ORDER BY
ic.key_ordinal
FOR XML PATH('')
) AS key_cols (key_cols)
OUTER APPLY (
SELECT
', ' + COL_NAME(object_id, ic.column_id)
FROM sys.index_columns ic
WHERE
ic.key_ordinal = 0 AND
ic.object_id = i.object_id AND
ic.index_id = i.index_id
ORDER BY
COL_NAME(object_id, ic.column_id)
FOR XML PATH('')
) AS nonkey_cols (nonkey_cols)
LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON
dps.object_id = i.object_id AND
dps.index_id = i.index_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON
ius.database_id = DB_ID() AND
ius.object_id = i.object_id AND
ius.index_id = i.index_id
LEFT OUTER JOIN (
SELECT
database_id, object_id, MAX(user_scans) AS user_scans,
ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans
FROM sys.dm_db_index_usage_stats WITH (NOLOCK)
WHERE
database_id = DB_ID()
--AND index_id > 0
GROUP BY
database_id, object_id
) AS ius2 ON
ius2.database_id = DB_ID() AND
ius2.object_id = i.object_id
LEFT OUTER JOIN (
SELECT
referenced_object_id, COUNT(*) AS Reference_Count
FROM sys.foreign_keys
WHERE
is_disabled = 0
GROUP BY
referenced_object_id
) AS fk ON
fk.referenced_object_id = i.object_id
WHERE
i.object_id > 100 AND
i.is_hypothetical = 0 AND
i.type IN (0, 1, 2) AND
o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND
(
o.name LIKE @table_name_pattern AND
o.name NOT LIKE 'dtprop%' AND
o.name NOT LIKE 'filestream[_]' AND
o.name NOT LIKE 'MSpeer%' AND
o.name NOT LIKE 'MSpub%' AND
--o.name NOT LIKE 'queue[_]%' AND
o.name NOT LIKE 'sys%'
)
--AND OBJECT_NAME(i.object_id /*, DB_ID()*/) IN ('tbl1', 'tbl2', 'tbl3')
ORDER BY
--row_count DESC,
--ius.user_scans DESC,
--ius2.row_num, --user_scans&|user_seeks
db_name, table_name,
-- list clustered index first, if any, then other index(es)
CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END,
key_cols
PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)
SET DEADLOCK_PRIORITY NORMAL
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 16, 2015 at 3:16 pm
If the PK is the single most accessed path to the data, yeah, I agree, it should be clustered, testing or not. Although, I prefer testing and advocate for it. But, if there's question about which column or columns define the most common access path to the data, I would absolutely resist putting on the PK just because it's the PK.
In terms of a link to a white paper, I'm not sure which one would tell you what you need. Of course testing something before doing it is the preferred method for just about anything you can do to your production system. You should measure before and after a change in order to know if the change worked or not. Just slapping stuff into the production system is dangerous (although, I agree, a clustered index isn't going to crash the system, but it sure will change how plans get generated, that's the idea).
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 16, 2015 at 3:39 pm
Wow thanks for all the replies. Very helpful.
I'll chew through it all tomorrow.
April 17, 2015 at 10:54 am
ScottPletcher (4/16/2015)
For example:
--USE [<your_db_name_here>] --naturally make sure you are in the right db
SET DEADLOCK_PRIORITY LOW --probably irrelevant, but just in case
DECLARE @list_missing_indexes bit
DECLARE @table_name_pattern sysname
--NOTE: showing missing indexes can take some time; set to 0 if you don't want to wait.
SET @list_missing_indexes = 1
SET @table_name_pattern = '??' --<<-- !table name goes here!
--SET @table_name_pattern = '%'
PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)
IF @list_missing_indexes = 1
BEGIN
SELECT
GETDATE() AS capture_date,
DB_NAME(mid.database_id) AS Db_Name,
dps.row_count,
OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,
mid.equality_columns, mid.inequality_columns, mid.included_columns,
user_seeks, user_scans, ca1.max_days_active, unique_compiles,
last_user_seek, last_user_scan, avg_total_user_cost, avg_user_impact,
system_seeks, system_scans, last_system_seek, last_system_scan, avg_total_system_cost, avg_system_impact,
mid.statement, mid.object_id, mid.index_handle
FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)
CROSS APPLY (
SELECT DATEDIFF(DAY, create_date, GETDATE()) AS max_days_active FROM sys.databases WHERE name = 'tempdb'
) AS ca1
LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON
mig.index_handle = mid.index_handle
LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON
migs.group_handle = mig.index_group_handle
LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON
dps.object_id = mid.object_id AND
dps.index_id IN (0, 1)
--order by
--DB_NAME, Table_Name, equality_columns
WHERE
1 = 1
AND mid.database_id = DB_ID() --only current db
AND OBJECT_NAME(mid.object_id) LIKE @table_name_pattern
--AND mid.object_id IN (OBJECT_ID('<table_name_1>'), OBJECT_ID('<table_name_2>'))
ORDER BY
--avg_total_user_cost * (user_seeks + user_scans) DESC,
Db_Name, Table_Name, equality_columns, inequality_columns
END --IF
PRINT 'Midpoint @ ' + CONVERT(varchar(30), GETDATE(), 120)
-- list index usage stats (seeks, scans, etc.)
SELECT
ius2.row_num, DB_NAME() AS db_name,
CASE WHEN i.name LIKE ca2.table_name + '%'
THEN '~' + SUBSTRING(i.name, LEN(ca2.table_name) + 1, 200)
ELSE i.name END AS index_name,
CASE WHEN i.is_unique = 0 THEN 'N' ELSE 'Y' END + '.' +
CASE WHEN i.is_primary_key = 0 AND i.is_unique_constraint = 0 THEN 'N' ELSE 'Y' END AS [uniq?],
ca2.table_name,
i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,
dps.row_count,
SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,
ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,
ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,
fk.Reference_Count AS fk_ref_count,
DATEDIFF(DAY, CASE WHEN o.create_date > ca1.sql_startup_date THEN o.create_date
ELSE ca1.sql_startup_date END, GETDATE()) AS max_days_active,
FILEGROUP_NAME(i.data_space_id) AS filegroup_name,
ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,
ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update
FROM sys.indexes i WITH (NOLOCK)
INNER JOIN sys.objects o WITH (NOLOCK) ON
o.object_id = i.object_id
CROSS JOIN (
SELECT create_date AS sql_startup_date FROM sys.databases WHERE name = 'tempdb'
) AS ca1
CROSS APPLY (
SELECT OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name
) AS ca2
OUTER APPLY (
SELECT
', ' + COL_NAME(object_id, ic.column_id)
FROM sys.index_columns ic
WHERE
ic.key_ordinal > 0 AND
ic.object_id = i.object_id AND
ic.index_id = i.index_id
ORDER BY
ic.key_ordinal
FOR XML PATH('')
) AS key_cols (key_cols)
OUTER APPLY (
SELECT
', ' + COL_NAME(object_id, ic.column_id)
FROM sys.index_columns ic
WHERE
ic.key_ordinal = 0 AND
ic.object_id = i.object_id AND
ic.index_id = i.index_id
ORDER BY
COL_NAME(object_id, ic.column_id)
FOR XML PATH('')
) AS nonkey_cols (nonkey_cols)
LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON
dps.object_id = i.object_id AND
dps.index_id = i.index_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON
ius.database_id = DB_ID() AND
ius.object_id = i.object_id AND
ius.index_id = i.index_id
LEFT OUTER JOIN (
SELECT
database_id, object_id, MAX(user_scans) AS user_scans,
ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans
FROM sys.dm_db_index_usage_stats WITH (NOLOCK)
WHERE
database_id = DB_ID()
--AND index_id > 0
GROUP BY
database_id, object_id
) AS ius2 ON
ius2.database_id = DB_ID() AND
ius2.object_id = i.object_id
LEFT OUTER JOIN (
SELECT
referenced_object_id, COUNT(*) AS Reference_Count
FROM sys.foreign_keys
WHERE
is_disabled = 0
GROUP BY
referenced_object_id
) AS fk ON
fk.referenced_object_id = i.object_id
WHERE
i.object_id > 100 AND
i.is_hypothetical = 0 AND
i.type IN (0, 1, 2) AND
o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND
(
o.name LIKE @table_name_pattern AND
o.name NOT LIKE 'dtprop%' AND
o.name NOT LIKE 'filestream[_]' AND
o.name NOT LIKE 'MSpeer%' AND
o.name NOT LIKE 'MSpub%' AND
--o.name NOT LIKE 'queue[_]%' AND
o.name NOT LIKE 'sys%'
)
--AND OBJECT_NAME(i.object_id /*, DB_ID()*/) IN ('tbl1', 'tbl2', 'tbl3')
ORDER BY
--row_count DESC,
--ius.user_scans DESC,
--ius2.row_num, --user_scans&|user_seeks
db_name, table_name,
-- list clustered index first, if any, then other index(es)
CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END,
key_cols
PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)
SET DEADLOCK_PRIORITY NORMAL
Scott, thanks for the code! Very interesting results, definitely food for thought. It will be interesting to see how the results change as I move from development through user testing to limited deployment.
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
April 17, 2015 at 6:48 pm
Remember the clustered index represents the order of the table, the primary key represents what is unique about each row. They are not necessarily the same as you know by now. You have a valid point as some queries will benefit from the new order and others may not. Keep in mind that new non-clustered indexes with an INCLUDE part can help you overcome any performance queries that result.
The simple way to go about it is to drop all indexes , then create the clustered index , then the non clustered ones. Not 100% but this method may also give you space back (if your heap had rows deleted from time to time).
----------------------------------------------------
April 19, 2015 at 5:07 pm
Your biggest hint as to best candidate for clustered index is highest number of the sum of seeks grouped by the leading index key.
Also, a symptom of poor choice of clustering or non existent clustering would be a smattering of either many space eating covering indexes, or extreme amount of lookups.
It's a no-brainer really. (except for vendors who never seem to include a DBA in their designs).
Probably best to find all the queries in the plan cache which hit the table and store the statistics and exec plans.
Then with those numbers, you have a baseline.
Then you can figure out, by looking at the queries and plans, which queries will benefit, and which will not,visually before even changing the indexing.
I've changes nonclustered PK's to clustered for most cases found and lookups disappear, blocking was heavily reduced, fat covering indexes no longer needed and dropped, deadlocks disappeared, and these were tables which were originally clustered on the wrong column, i.e. not as bad as heaps.
Be careful of fillfactor, else it is possible that your table ends up taking up more space, unless that is what is expected.
April 20, 2015 at 9:30 am
The simple way to go about it is to drop all indexes , then create the clustered index , then the non clustered ones. Not 100% but this method may also give you space back (if your heap had rows deleted from time to time).
I second this method. If you don't drop the non-clustered indexes first, you may find that the process takes a considerable amount of time adding the clustered index. It will normally go much faster the other way.
I've changes nonclustered PK's to clustered for most cases found and lookups disappear, blocking was heavily reduced, fat covering indexes no longer needed and dropped, deadlocks disappeared, and these were tables which were originally clustered on the wrong column, i.e. not as bad as heaps. Be careful of fillfactor, else it is possible that your table ends up taking up more space, unless that is what is expected.
If your choice of a clustered index is such that new records could be inserted anywhere, you should have a fill factor. This will reduce the speed at which the table fragments. If it's sequential (even if not an identity but something like a creation date), a fill factor is probably not necessary.
As for whether the PK could be clustered or not, that is a table by table decision. We have recently seen a lot of improvements in an application by moving the PKs from the primary key to some other more appropriate column or combination of columns (although rarely is more than one column used). The tables had been reflexively clustered on the PK. This is often not the best decision.
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply