Changing the clustered index

Question

Post reply

Changing the clustered index

djj

SSCoach

Points: 18831
More actions
February 27, 2015 at 12:01 pm

#398096

Hello,
After reading some comments here I decided to look at tables to see if any had a clustered index that was a unique identifier. Yep. So if I have a table with a unique identifier as the primary key/clustered index and an identity column that is indexed, I would like to make the identity a clustered index (maybe even the primary key) and make the unique identifier a unique non-clustered index (not the primary key).
Does this sound reasonable?
If I do this will I need to drop and recreate the other indexes? Or maybe just rebuild the other indexes?
Currently:
CREATE TABLE Payments (
IDX INT IDENTITY(1,1) NOT NULL,
GUID UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()),
..... -- many other columns
);
GO
ALTER TABLE [dbo].[PAYMENTS] ADD CONSTRAINT [PK_PAYMENTS_GID] PRIMARY KEY CLUSTERED ([GUID] ASC);
GO
CREATE NONCLUSTERED INDEX [IX_Payments_ID] ON [dbo].[PAYMENTS] ([IDX] ASC);
GO
Would like:
ALTER TABLE [dbo].[PAYMENTS] ADD CONSTRAINT [PK_PAYMENTS_IDX] PRIMARY KEY CLUSTERED (IDX ASC);
GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_Payments_GUID] ON [dbo].[PAYMENTS] (GUID ASC);
GO
Thank you

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply

sql-lover SSCoach Points: 18530 More actions · Answer 1

Arghhhhhhhhhhhhh! :crazy: ...

Did I read well? A GUID as a Clustered Index? Yes! Eliminate that if you can!

Take a look on this article[/url] from Kimberly.

ScottPletcher SSC Guru Points: 101238 More actions · Answer 2

Before making such a critical choice, review SQL's missing index and index usage stats. Odds are, you have a better choice than an identity column for the clustering index key. That is, more than 50% of the time, overall performance is better clustering on something other than just a meaningless identity value.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 3

I'm not a fan of GUIDs, but they are not inherently evil. Before you replace it, I'd find out why you have it. If there's no good reason, then by all means, rip it out. But be sure there's not a good purpose there.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

djj SSCoach Points: 18831 More actions · Answer 4

The only unique things in the table are IDX and GUID. Using a combination of columns it is still not possible to get a unique combination. Since this table has been around for about ten years, it is not normalized, but is used everywhere. 🙁

djj SSCoach Points: 18831 More actions · Answer 5

Grant Fritchey (2/27/2015)
I'm not a fan of GUIDs, but they are not inherently evil. Before you replace it, I'd find out why you have it. If there's no good reason, then by all means, rip it out. But be sure there's not a good purpose there.

I will keep the GUID column but just change from primary/clustered to unique non-clustered index.

The other option is to change the clustered index and keep the GUID the primary key.

ScottPletcher SSC Guru Points: 101238 More actions · Answer 6

djj (2/27/2015)
The only unique things in the table are IDX and GUID. Using a combination of columns it is still not possible to get a unique combination. Since this table has been around for about ten years, it is not normalized, but is used everywhere. 🙁

Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.

And that's a secondary concern. The most critical performance factor is that you get the best clustered index for that table.

This code will show you the unused index stats and the index usage stats for the table(s).

--USE [<your_db_name_here>] --naturally make sure you are in the right db

SET DEADLOCK_PRIORITY LOW --probably irrelevant, but just in case

DECLARE @list_missing_indexes bit

DECLARE @table_name_pattern sysname

--NOTE: showing missing indexes can take some time; set to 0 if you don't want to wait.

SET @list_missing_indexes = 1

SET @table_name_pattern = '%' --<<-- !your table name/pattern goes here!

--SET @table_name_pattern = '%'

PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)

IF @list_missing_indexes = 1

BEGIN

SELECT

GETDATE() AS capture_date,

DB_NAME(mid.database_id) AS Db_Name,

dps.row_count,

OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,

mid.equality_columns, mid.inequality_columns, mid.included_columns,

user_seeks, user_scans, ca1.max_days_active, unique_compiles,

last_user_seek, last_user_scan, avg_total_user_cost, avg_user_impact,

system_seeks, system_scans, last_system_seek, last_system_scan, avg_total_system_cost, avg_system_impact,

mid.statement, mid.object_id, mid.index_handle

FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)

CROSS APPLY (

SELECT DATEDIFF(DAY, create_date, GETDATE()) AS max_days_active FROM sys.databases WHERE name = 'tempdb'

) AS ca1

LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON

mig.index_handle = mid.index_handle

LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON

migs.group_handle = mig.index_group_handle

LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON

dps.object_id = mid.object_id AND

dps.index_id IN (0, 1)

--order by

--DB_NAME, Table_Name, equality_columns

WHERE

1 = 1

AND mid.database_id = DB_ID() --only current db

AND OBJECT_NAME(mid.object_id) LIKE @table_name_pattern

--AND mid.object_id IN (OBJECT_ID('<table_name_1>'), OBJECT_ID('<table_name_2>'))

ORDER BY

--avg_total_user_cost * (user_seeks + user_scans) DESC,

Db_Name, Table_Name, equality_columns, inequality_columns

END --IF

PRINT 'Midpoint @ ' + CONVERT(varchar(30), GETDATE(), 120)

-- list index usage stats (seeks, scans, etc.)

SELECT

ius2.row_num, DB_NAME() AS db_name,

CASE WHEN i.name LIKE ca2.table_name + '%'

THEN '~' + SUBSTRING(i.name, LEN(ca2.table_name) + 1, 200)

ELSE i.name END AS index_name,

CASE WHEN i.is_unique = 0 THEN 'N' ELSE 'Y' END + '.' +

CASE WHEN i.is_primary_key = 0 AND i.is_unique_constraint = 0 THEN 'N' ELSE 'Y' END AS [uniq?],

ca2.table_name,

i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,

dps.row_count,

SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,

ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,

ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,

fk.Reference_Count AS fk_ref_count,

DATEDIFF(DAY, CASE WHEN o.create_date > ca1.sql_startup_date THEN o.create_date

ELSE ca1.sql_startup_date END, GETDATE()) AS max_days_active,

FILEGROUP_NAME(i.data_space_id) AS filegroup_name,

ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,

ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update

FROM sys.indexes i WITH (NOLOCK)

INNER JOIN sys.objects o WITH (NOLOCK) ON

o.object_id = i.object_id

CROSS JOIN (

SELECT create_date AS sql_startup_date FROM sys.databases WHERE name = 'tempdb'

) AS ca1

CROSS APPLY (

SELECT OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name

) AS ca2

OUTER APPLY (

SELECT

', ' + COL_NAME(object_id, ic.column_id)

FROM sys.index_columns ic

WHERE

ic.key_ordinal > 0 AND

ic.object_id = i.object_id AND

ic.index_id = i.index_id

ORDER BY

ic.key_ordinal

FOR XML PATH('')

) AS key_cols (key_cols)

OUTER APPLY (

SELECT

', ' + COL_NAME(object_id, ic.column_id)

FROM sys.index_columns ic

WHERE

ic.key_ordinal = 0 AND

ic.object_id = i.object_id AND

ic.index_id = i.index_id

ORDER BY

COL_NAME(object_id, ic.column_id)

FOR XML PATH('')

) AS nonkey_cols (nonkey_cols)

LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON

dps.object_id = i.object_id AND

dps.index_id = i.index_id

LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON

ius.database_id = DB_ID() AND

ius.object_id = i.object_id AND

ius.index_id = i.index_id

LEFT OUTER JOIN (

SELECT

database_id, object_id, MAX(user_scans) AS user_scans,

ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans

FROM sys.dm_db_index_usage_stats WITH (NOLOCK)

WHERE

database_id = DB_ID()

--AND index_id > 0

GROUP BY

database_id, object_id

) AS ius2 ON

ius2.database_id = DB_ID() AND

ius2.object_id = i.object_id

LEFT OUTER JOIN (

SELECT

referenced_object_id, COUNT(*) AS Reference_Count

FROM sys.foreign_keys

WHERE

is_disabled = 0

GROUP BY

referenced_object_id

) AS fk ON

fk.referenced_object_id = i.object_id

WHERE

i.object_id > 100 AND

i.is_hypothetical = 0 AND

i.type IN (0, 1, 2) AND

o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND

(

o.name LIKE @table_name_pattern AND

o.name NOT LIKE 'dtprop%' AND

o.name NOT LIKE 'filestream[_]' AND

o.name NOT LIKE 'MSpeer%' AND

o.name NOT LIKE 'MSpub%' AND

--o.name NOT LIKE 'queue[_]%' AND

o.name NOT LIKE 'sys%'

)

--AND OBJECT_NAME(i.object_id /*, DB_ID()*/) IN ('tbl1', 'tbl2', 'tbl3')

ORDER BY

--row_count DESC,

--ius.user_scans DESC,

--ius2.row_num, --user_scans&|user_seeks

db_name, table_name,

-- list clustered index first, if any, then other index(es)

CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END,

key_cols

PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)

SET DEADLOCK_PRIORITY NORMAL

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

sql-lover SSCoach Points: 18530 More actions · Answer 7

djj (2/27/2015)
Grant Fritchey (2/27/2015)
I'm not a fan of GUIDs, but they are not inherently evil. Before you replace it, I'd find out why you have it. If there's no good reason, then by all means, rip it out. But be sure there's not a good purpose there.
I will keep the GUID column but just change from primary/clustered to unique non-clustered index.
The other option is to change the clustered index and keep the GUID the primary key.

My initial comment was about that GUID being a Cluster Index, not dropping the GUID, if your app really uses it. A GUID as a Cluster Index is bad! The fragmentation will kill you.

If that GUID should be a unique NCI or PK, it's up to you. They internally act pretty much the same, except that primary key column cannot be nullable, unique index column can, I think. But for some third party apps and if you are trying to adhere to the relational model, a PK should be deployed in SQL as an actual PK.

djj SSCoach Points: 18831 More actions · Answer 8

Thanks everyone.

Scott, I will try your code later as I am getting ready to leave. 😀

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

ScottPletcher (2/27/2015)
Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.

... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101238 More actions · Answer 10

Jeff Moden (2/27/2015)
ScottPletcher (2/27/2015)
Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.
... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.

Or at virtually no cost, depending on what % of rows have dups, how many dups there are, and how many rows fit on a page. But, again, all those are secondary concerns compared to the primary concern of getting the best clustering index key(s).

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 11

ScottPletcher (2/27/2015)
Jeff Moden (2/27/2015)
ScottPletcher (2/27/2015)
Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.
... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.
Or at virtually no cost, depending on what % of rows have dups, how many dups there are, and how many rows fit on a page. But, again, all those are secondary concerns compared to the primary concern of getting the best clustering index key(s).

I disagree, Scott. Those should all be primary concerns for selecting what the best clustered index is. As you say, there may be no cost to using a non-unique clustered index... or there may be a huge cost that could actually be repaired fairly easily (using methods you hate, oddly enough) but, if you don't know, then any change is a shot in the dark especially on large tables.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101238 More actions · Answer 12

Jeff Moden (2/27/2015)
ScottPletcher (2/27/2015)
Jeff Moden (2/27/2015)
ScottPletcher (2/27/2015)
Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.
... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.
Or at virtually no cost, depending on what % of rows have dups, how many dups there are, and how many rows fit on a page. But, again, all those are secondary concerns compared to the primary concern of getting the best clustering index key(s).
I disagree, Scott. Those should all be primary concerns for selecting what the best clustered index is. As you say, there may be no cost to using a non-unique clustered index... or there may be a huge cost that could actually be repaired fairly easily (using methods you hate, oddly enough) but, if you don't know, then any change is a shot in the dark especially on large tables.

Don't be so obsessed with the INSERT process. Each row is Selected 100x or more for its 1 INSERT. Thus, I believe getting the best clustered index keys is first; you can deal with duplicates, if necessary, later, except perhaps on very large tables. Often avoiding gazillions of covering indexes will gain you much more than you lose by duplicate key values.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 13

ScottPletcher (2/27/2015)
Jeff Moden (2/27/2015)
ScottPletcher (2/27/2015)
Jeff Moden (2/27/2015)
ScottPletcher (2/27/2015)
Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.
... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.
Or at virtually no cost, depending on what % of rows have dups, how many dups there are, and how many rows fit on a page. But, again, all those are secondary concerns compared to the primary concern of getting the best clustering index key(s).
I disagree, Scott. Those should all be primary concerns for selecting what the best clustered index is. As you say, there may be no cost to using a non-unique clustered index... or there may be a huge cost that could actually be repaired fairly easily (using methods you hate, oddly enough) but, if you don't know, then any change is a shot in the dark especially on large tables.
Don't be so obsessed with the INSERT process. Each row is Selected 100x or more for its 1 INSERT. Thus, I believe getting the best clustered index keys is first; you can deal with duplicates, if necessary, later, except perhaps on very large tables. Often avoiding gazillions of covering indexes will gain you much more than you lose by duplicate key values.

You say that but I've seen systems that have been instantly and absolutely paralyzed just by someone adding the wrong index. My personal example is that I single handedly paralyzed a part of Expedia.com for a short period just by adding a the wrong index that was more concerned with SELECT performance than INSERT performance.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101238 More actions · Answer 14

Jeff Moden (2/27/2015)
ScottPletcher (2/27/2015)
Jeff Moden (2/27/2015)
ScottPletcher (2/27/2015)
Jeff Moden (2/27/2015)
ScottPletcher (2/27/2015)
Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.
... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.
Or at virtually no cost, depending on what % of rows have dups, how many dups there are, and how many rows fit on a page. But, again, all those are secondary concerns compared to the primary concern of getting the best clustering index key(s).
I disagree, Scott. Those should all be primary concerns for selecting what the best clustered index is. As you say, there may be no cost to using a non-unique clustered index... or there may be a huge cost that could actually be repaired fairly easily (using methods you hate, oddly enough) but, if you don't know, then any change is a shot in the dark especially on large tables.
Don't be so obsessed with the INSERT process. Each row is Selected 100x or more for its 1 INSERT. Thus, I believe getting the best clustered index keys is first; you can deal with duplicates, if necessary, later, except perhaps on very large tables. Often avoiding gazillions of covering indexes will gain you much more than you lose by duplicate key values.
You say that but I've seen systems that have been instantly and absolutely paralyzed just by someone adding the wrong index. My personal example is that I single handedly paralyzed a part of Expedia.com for a short period just by adding a the wrong index that was more concerned with SELECT performance than INSERT performance.

Can't believe an index on a single key value that is explicitly specified in every user request could be "bad" or "wrong". Seems bizarre to me to consider clustering on anything else given that info going on, barring extremely unusual insert patterns, except based on a pre-determined obsession with using only "narrow, ever-increasing, ..." clustering key values.

I've also not had an index based on careful analysis of missing index and index usage stats, and a few other stats I review, cause me such an issue.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".