GUIDs as clustered index

Question

GUIDs as clustered index

Viewing 15 posts - 16 through 30 (of 58 total)

You must be logged in to reply to this topic. Login to reply

vsamantha35 SSChampion Points: 11551 More actions · Answer 1

frederico_fonseca - Thursday, May 31, 2018 3:58 AM
Can you supply the DDL for the original tables, not the ones you have altered to add the newid column

These are Live tables. There are like 70 odd tables suffering with GUID issues on PROD. So, instead of providing all that information, I just picked 1 master table and 2 child tables.

Jeff Moden SSC Guru Points: 1004582 More actions · Answer 2

frederico_fonseca - Wednesday, May 30, 2018 6:23 PM
Jeff Moden - Wednesday, May 30, 2018 3:05 PM
p.s. While I hate the use of GUIDs (I simply don't like looking at them or typing them) and totally contrary to popular belief, it's actually quite easy to manage them and their indexes. The fact that they're random (unless you're also making the mistake of using sequential GUIDs) doesn't actually matter much (in fact... they great for inserts because they prevent the "IDENTITY HOTSPOT" problem) if you have the INSERT/expAnsive UPDATE problem that I told you about above.
Seriously... get back to me with the stuff I asked for before you good folks go making changes that may not help at all.
Out of curiosity why do you say "(unless you're also making the mistake of using sequential GUIDs)"

Because they're sequential, they have the same hot spot issues as an IDENTITY column and are guaranteed to do page splits even if they're mostly good page splits (they still take extra time and every ms counts on nasty busy systems). They're also not entirely sequential. Just before you do a restart of the SQL Server Service or do an all out reboot, determine what the highest value GUID is (they do sort funny so be prepared) and then see what the new GUIDs being generated are after the reboot. There's a 50/50 chance that they'll be less than what the previously determined max is and that can result in a flurry of page splits right after the reboot and right after defragmentation. They're (all GUIDs, sequential or not) also 16 bytes wide compared to even a BIGINT-based IDENTITY column is and an IDENTITY column is guaranteed to NOT have a lesser value after a reboot.. Remember that if you use the GUID as the lead column in a Clustered Key/Index, it will also appear in ALL non-clustered indexes.

As a bit of a side bar, the GUIDs used in SQL Server are "Type 4" GUIDs, which means that they're nothing more than a huge pseudo random number. While that means that its astronomically not likely that you'll ever generate a dupe on a single box, you could end up with dupes if you import from multiple boxes. You MUST add a unique constraint if you truly want to guarantee uniqueness.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004582 More actions · Answer 3

Sergiy - Wednesday, May 30, 2018 6:48 PM
Whatever kind of identity you choose to use - it's the worst possible choice for a clustered key.
The best candidate for a clustered index is a column (set of columns) which is usually used for range selection.
I've never seen a query using something like "WHERE ID BETWEEN @FromID and @ToID"

As always, "It Depends". While I absolutely agree that most people won't ever have cause to use the WHERE clause you indicated, it's incredibly useful in avoiding any bad page splits without having to do any index maintenance at all for INSERT-only CIs. It's also incredibly useful for OLTP where the IDENTITY column is used as the main lookup. Range selection on a predominately OLTP table isn't the best justification for the selection of the clustered key, especially if someone has a poorly designed table that has very wide rows.

Even on predominately batch-oriented tables, if the row width is excessive and a lot of the columns are never used for reporting, the use of a clustered index to satisfy common query criteria to support range scans may (and frequently does) slow things down a fair bit because it has to deal with the entire row width at the leaf level during a range scan. That would cause a lot of unnecessary data to be loaded into memory as well as causing a relatively huge amount of logical reads, which can be a serious performance problem because even at memory channel speeds, there are still limits and reading through unnecessary page content doesn't make sense.

As with making the mistake of automatically assuming the an IDENTITY column should be used on every table, automatically assuming that the CI should be based on something to support range scans may also be a mistake. "Must look eye".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004582 More actions · Answer 4

vsamantha35 - Thursday, May 31, 2018 2:15 AM
Jeff Moden - Wednesday, May 30, 2018 3:28 PM
vsamantha35 - Wednesday, May 30, 2018 3:14 PM
Hi Jeff,
Thanks for taking time and providing insights. thanks.
My recommendation is to post the CREATE TABLE code for the table in question along with ALL indexes/constraints so I can have a look at it.
-- I can get you the information.
I also need to know if the table is suffering from the INSERT/expAnsive UPDATE problem that I told you about above.
-- how can we validate this? what u r saying , if that table is having more INSERTS or more UPDATES??? Is there any DMV to pull this information out?
I might be able to SWAG IT by looking at the CREATE TABLE statement and the indexes. Also, how many rows in your largest 4 tables?
Hi Jeff, Please find the attached tablescript. For sake for simplicity considering 1 parent table and 2 child tables which are GUIDs.
Also, on prod I already have existing data, how can I use an UPDATE statement and replacing the GUIDs with an int/bigint column value and same homework has to be done to all the child tables, how would be my update statement in parent child tables and permanantly get rid of GUIDs then what approach to be taken. Could you please give simple demo example with some few rows on how I can get rid of those GUIDS permanantly from parent and child table.
Again, thanks everyone for the help so far.

Thanks, Sam. I'll do a deep dive on these after work tonight. A cursory glance says there are some problems that are leading to your fragmentation issues (as well as some other basic issues) and changing the Clustered Index to the "NewID" doesn't look like it's actually going to fix your fragmentation issue on at least one of the tables you provided the DDL for. In fact, it may make things worse by creating an "Insert Hot Spot" and provide you with a new fragmentation problem because the table does look like it's an INSERT/ExpAnsive UPDATE table.

In the mean time, please confirm that you actually are using SQL Server 2012 and let me know if you're using the Standard Edition or the Enterprise Edition. I also need to you run the following code (in the same database as the 3 tables live) near the end of today before you do any index maintenance on the 3 tables you provided (or run it now if it's been a day or two since you've run your index maintenance routines). Run the code in the "Grid Results" mode, copy and paste it into a spreadsheet, and attach the spreadsheet. It'll give me some great understanding as to what we're dealing with on a daily basis.

SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('AMT.ItemVersions'),NULL,NULL,'SAMPLED') UNION ALL SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('AMT.ItemTags'),NULL,NULL,'SAMPLED') UNION ALL SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('Questionnaire.ItemActions'),NULL,NULL,'SAMPLED') ;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101189 More actions · Answer 5

Jeff Moden - Thursday, May 31, 2018 8:32 AM
Sergiy - Wednesday, May 30, 2018 6:48 PM
Whatever kind of identity you choose to use - it's the worst possible choice for a clustered key.
The best candidate for a clustered index is a column (set of columns) which is usually used for range selection.
I've never seen a query using something like "WHERE ID BETWEEN @FromID and @ToID"
As always, "It Depends". While I absolutely agree that most people won't ever have cause to use the WHERE clause you indicated, it's incredibly useful in avoiding any bad page splits without having to do any index maintenance at all for INSERT-only CIs. It's also incredibly useful for OLTP where the IDENTITY column is used as the main lookup. Range selection on a predominately OLTP table isn't the best justification for the selection of the clustered key, especially if someone has a poorly designed table that has very wide rows.
Even on predominately batch-oriented tables, if the row width is excessive and a lot of the columns are never used for reporting, the use of a clustered index to satisfy common query criteria to support range scans may (and frequently does) slow things down a fair bit because it has to deal with the entire row width at the leaf level during a range scan. That would cause a lot of unnecessary data to be loaded into memory as well as causing a relatively huge amount of logical reads, which can be a serious performance problem because even at memory channel speeds, there are still limits and reading through unnecessary page content doesn't make sense.
As with making the mistake of automatically assuming the an IDENTITY column should be used on every table, automatically assuming that the CI should be based on something to support range scans may also be a mistake. "Must look eye".

Absolutely, it depends. Unquestionably identity is sometimes useful as a clustering key, for example, the customer "master" table, where customer_id is almost always the row identifier and is the clus key as well.

But, we've all seen tables clustered on identity that get fragmented. It's just a myth that they don't. It's rarer than for other key types, but it's certainly possible.

As to "unnecessary" data loads for tables with lots of columns, the major part of the problem there is a lack of data modeling, including proper normalization, before any table creation. The vast majority of people completely skip that step, claiming it's not needed. Some of the bad results are ultra-wide tables and having to work around them later to avoid poor performance, or to alleviate it as best as they can.

Skipping modeling also skips encoding data. Long strings, esp. repetitive ones, should often be encoded to a numeric value. For example, many tables now automatically have "user_added" and "user_modified" columns added, as varchars. Those should be numeric codes, rather than repeating the strings in every row (better yet, that type of data should be stored externally to the table, but most shops don't consider that at all). Statuses should never be "Complete", "Shipped", etc., but a code representing that (tinyint is usually more than sufficient).

Finally, back to clustering itself, I've observed that even for people who consider keys other than ident, the general rule is to "default" to identity if nothing else is clearly better. That's just not correct in real world practice. I've tuned literally thousands of SQL dbs, and I've seen literally only a handful of those where 50% or more of the tables were properly clustered by identity. So, the majority of the time, identity is not the best clustering key for the table.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004582 More actions · Answer 6

ScottPletcher - Thursday, May 31, 2018 9:18 AM
Jeff Moden - Thursday, May 31, 2018 8:32 AM
Sergiy - Wednesday, May 30, 2018 6:48 PM
Whatever kind of identity you choose to use - it's the worst possible choice for a clustered key.
The best candidate for a clustered index is a column (set of columns) which is usually used for range selection.
I've never seen a query using something like "WHERE ID BETWEEN @FromID and @ToID"
As always, "It Depends". While I absolutely agree that most people won't ever have cause to use the WHERE clause you indicated, it's incredibly useful in avoiding any bad page splits without having to do any index maintenance at all for INSERT-only CIs. It's also incredibly useful for OLTP where the IDENTITY column is used as the main lookup. Range selection on a predominately OLTP table isn't the best justification for the selection of the clustered key, especially if someone has a poorly designed table that has very wide rows.
Even on predominately batch-oriented tables, if the row width is excessive and a lot of the columns are never used for reporting, the use of a clustered index to satisfy common query criteria to support range scans may (and frequently does) slow things down a fair bit because it has to deal with the entire row width at the leaf level during a range scan. That would cause a lot of unnecessary data to be loaded into memory as well as causing a relatively huge amount of logical reads, which can be a serious performance problem because even at memory channel speeds, there are still limits and reading through unnecessary page content doesn't make sense.
As with making the mistake of automatically assuming the an IDENTITY column should be used on every table, automatically assuming that the CI should be based on something to support range scans may also be a mistake. "Must look eye".
Absolutely, it depends. Unquestionably identity is sometimes useful as a clustering key, for example, the customer "master" table, where customer_id is almost always the row identifier and is the clus key as well.
But, we've all seen tables clustered on identity that get fragmented. It's just a myth that they don't. It's rarer than for other key types, but it's certainly possible.
As to "unnecessary" data loads for tables with lots of columns, the major part of the problem there is a lack of data modeling, including proper normalization, before any table creation. The vast majority of people completely skip that step, claiming it's not needed. Some of the bad results are ultra-wide tables and having to work around them later to avoid poor performance, or to alleviate it as best as they can.
Skipping modeling also skips encoding data. Long strings, esp. repetitive ones, should often be encoded to a numeric value. For example, many tables now automatically have "user_added" and "user_modified" columns added, as varchars. Those should be numeric codes, rather than repeating the strings in every row (better yet, that type of data should be stored externally to the table, but most shops don't consider that at all). Statuses should never be "Complete", "Shipped", etc., but a code representing that (tinyint is usually more than sufficient).
Finally, back to clustering itself, I've observed that even for people who consider keys other than ident, the general rule is to "default" to identity if nothing else is clearly better. That's just not correct in real world practice. I've tuned literally thousands of SQL dbs, and I've seen literally only a handful of those where 50% or more of the tables were properly clustered by identity. So, the majority of the time, identity is not the best clustering key for the table.

Correct. There is actually no form of "ever increasing key" that is impervious to page splits if they are later updated with "ExpAnsive" data, such as with the variable width columns that people use for things like status or "ModifiedBY". Sam's team actually did avoid that serious mistake but some of their columns still make at least one of the 3 tables provided an INSERT/Expansive UPDATE table and that's a real problem (as we both agree on) when it comes to perpetuating bad page splits, perpetuating fragmentation, perpetuating waste of disk space and memory and (ironically in one of the worst Catch-22s there is) perpetuating the need for index maintenance that doesn't actually prevent any of that.

I also agree on the idea of automatically clustering on an IDENTITY column being a bit idiotic. There are cases, such as those we have in several of our larger databases where it actually is a great idea (as you said) and it covers more like 95% of the tables.

I'm absolutely with you that design should never happen simply through "default" thinking.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101189 More actions · Answer 7

Jeff Moden - Thursday, May 31, 2018 9:33 AM
ScottPletcher - Thursday, May 31, 2018 9:18 AM
Jeff Moden - Thursday, May 31, 2018 8:32 AM
Sergiy - Wednesday, May 30, 2018 6:48 PM
Whatever kind of identity you choose to use - it's the worst possible choice for a clustered key.
The best candidate for a clustered index is a column (set of columns) which is usually used for range selection.
I've never seen a query using something like "WHERE ID BETWEEN @FromID and @ToID"
As always, "It Depends". While I absolutely agree that most people won't ever have cause to use the WHERE clause you indicated, it's incredibly useful in avoiding any bad page splits without having to do any index maintenance at all for INSERT-only CIs. It's also incredibly useful for OLTP where the IDENTITY column is used as the main lookup. Range selection on a predominately OLTP table isn't the best justification for the selection of the clustered key, especially if someone has a poorly designed table that has very wide rows.
Even on predominately batch-oriented tables, if the row width is excessive and a lot of the columns are never used for reporting, the use of a clustered index to satisfy common query criteria to support range scans may (and frequently does) slow things down a fair bit because it has to deal with the entire row width at the leaf level during a range scan. That would cause a lot of unnecessary data to be loaded into memory as well as causing a relatively huge amount of logical reads, which can be a serious performance problem because even at memory channel speeds, there are still limits and reading through unnecessary page content doesn't make sense.
As with making the mistake of automatically assuming the an IDENTITY column should be used on every table, automatically assuming that the CI should be based on something to support range scans may also be a mistake. "Must look eye".
Absolutely, it depends. Unquestionably identity is sometimes useful as a clustering key, for example, the customer "master" table, where customer_id is almost always the row identifier and is the clus key as well.
But, we've all seen tables clustered on identity that get fragmented. It's just a myth that they don't. It's rarer than for other key types, but it's certainly possible.
As to "unnecessary" data loads for tables with lots of columns, the major part of the problem there is a lack of data modeling, including proper normalization, before any table creation. The vast majority of people completely skip that step, claiming it's not needed. Some of the bad results are ultra-wide tables and having to work around them later to avoid poor performance, or to alleviate it as best as they can.
Skipping modeling also skips encoding data. Long strings, esp. repetitive ones, should often be encoded to a numeric value. For example, many tables now automatically have "user_added" and "user_modified" columns added, as varchars. Those should be numeric codes, rather than repeating the strings in every row (better yet, that type of data should be stored externally to the table, but most shops don't consider that at all). Statuses should never be "Complete", "Shipped", etc., but a code representing that (tinyint is usually more than sufficient).
Finally, back to clustering itself, I've observed that even for people who consider keys other than ident, the general rule is to "default" to identity if nothing else is clearly better. That's just not correct in real world practice. I've tuned literally thousands of SQL dbs, and I've seen literally only a handful of those where 50% or more of the tables were properly clustered by identity. So, the majority of the time, identity is not the best clustering key for the table.
Correct. There is actually no form of "ever increasing key" that is impervious to page splits if they are later updated with "ExpAnsive" data, such as with the variable width columns that people use for things like status or "ModifiedBY". Sam's team actually did avoid that serious mistake but some of their columns still make at least one of the 3 tables provided an INSERT/Expansive UPDATE table and that's a real problem (as we both agree on) when it comes to perpetuating bad page splits, perpetuating fragmentation, perpetuating waste of disk space and memory and (ironically in one of the worst Catch-22s there is) perpetuating the need for index maintenance that doesn't actually prevent any of that.
I also agree on the idea of automatically clustering on an IDENTITY column being a bit idiotic. There are cases, such as those we have in several of our larger databases where it actually is a great idea (as you said) and it covers more like 95% of the tables.
I'm absolutely with you that design should never happen simply through "default" thinking.

Again, I've seen only a handful of dbs where the best clustering was identity on over 50% of the table. Usually the % of ident being best clus key is significantly below 50% even. My best guesstimate is that 30-35%, no more, of tables in an "average" db are best clustered by ident.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

vsamantha35 SSChampion Points: 11551 More actions · Answer 8

Jeff Moden - Thursday, May 31, 2018 9:17 AM
vsamantha35 - Thursday, May 31, 2018 2:15 AM
Jeff Moden - Wednesday, May 30, 2018 3:28 PM
vsamantha35 - Wednesday, May 30, 2018 3:14 PM
Hi Jeff,
Thanks for taking time and providing insights. thanks.
My recommendation is to post the CREATE TABLE code for the table in question along with ALL indexes/constraints so I can have a look at it.
-- I can get you the information.
I also need to know if the table is suffering from the INSERT/expAnsive UPDATE problem that I told you about above.
   --   how can we validate this? what u r saying , if that table is having more INSERTS or more UPDATES??? Is there any DMV to pull this information out?
I might be able to SWAG IT by looking at the CREATE TABLE statement and the indexes. Also, how many rows in your largest 4 tables?
Hi Jeff, Please find the attached tablescript. For sake for simplicity considering 1 parent table and 2 child tables which are GUIDs.
Also, on prod I already have existing data, how can I use an UPDATE statement and replacing the GUIDs with an int/bigint column value and same homework has to be done to all the child tables, how would be my update statement in parent child tables and permanantly get rid of GUIDs then what approach to be taken. Could you please give simple demo example with some few rows on how I can get rid of those GUIDS permanantly from parent and child table.
Again, thanks everyone for the help so far.
Thanks, Sam. I'll do a deep dive on these after work tonight. A cursory glance says there are some problems that are leading to your fragmentation issues (as well as some other basic issues) and changing the Clustered Index to the "NewID" doesn't look like it's actually going to fix your fragmentation issue on at least one of the tables you provided the DDL for.  In fact, it may make things worse by creating an "Insert Hot Spot" and provide you with a new fragmentation problem because the table does look like it's an INSERT/ExpAnsive UPDATE table.
In the mean time, please confirm that you actually are using SQL Server 2012 and let me know if you're using the Standard Edition or the Enterprise Edition. I also need to you run the following code (in the same database as the 3 tables live) near the end of today before you do any index maintenance on the 3 tables you provided (or run it now if it's been a day or two since you've run your index maintenance routines). Run the code in the "Grid Results" mode, copy and paste it into a spreadsheet, and attach the spreadsheet. It'll give me some great understanding as to what we're dealing with on a daily basis.
SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('AMT.ItemVersions'),NULL,NULL,'SAMPLED') UNION ALL SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('AMT.ItemTags'),NULL,NULL,'SAMPLED') UNION ALL SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('Questionnaire.ItemActions'),NULL,NULL,'SAMPLED') ;

Using SQL Server 2016 Service Pack 1. I ll try to get the output tomorrow.

Jeff Moden SSC Guru Points: 1004582 More actions · Answer 9

ScottPletcher - Thursday, May 31, 2018 9:58 AM
Again, I've seen only a handful of dbs where the best clustering was identity on over 50% of the table. Usually the % of ident being best clus key is significantly below 50% even. My best guesstimate is that 30-35%, no more, of tables in an "average" db are best clustered by ident.

In that case, I guess I'd need to know what your definition of the "best clustering".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004582 More actions · Answer 10

vsamantha35 - Thursday, May 31, 2018 11:26 AM
Using SQL Server 2016 Service Pack 1. I ll try to get the output tomorrow.

Thanks. Still need to know the edition, though. Standard or Enterprise? It WILL make a difference for all this.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101189 More actions · Answer 11

Jeff Moden - Thursday, May 31, 2018 11:29 AM
In that case, I guess I'd need to know what your definition of the "best clustering".

Overall db performance. Not obsessed with just page splits, but total I/O overall, including being able to use fewer "covering" indexes (and often not needing covering indexes at all).

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004582 More actions · Answer 12

ScottPletcher - Thursday, May 31, 2018 11:45 AM
Overall db performance. Not obsessed with just page splits, but total I/O overall, including being able to use fewer "covering" indexes (and often not needing covering indexes at all).

Not sure why you use the words "obsessed" or "just" when it comes to page splits. If you don't consider those, then "overall db performance" will suffer and you'll waste a whole lot of resources especially when it comes to defragmenting to recover wasted page space, wasted memory, wasted clock cycles reading pages that are frequently more than 30% empty not to mention huge post defragmentation performance problems because of all that and the fact that bad pages splits cause 5.5 to 43 times more log file activity. And guess what you're doing while all that log file activity takes place? Waiting.

I'll also state that creating a Clustered Index based on the most common queries is a usually a performance mistake, especially on wider tables, because it's a bit like using SELECT * everywhere. You end up reading a heck of a lot more data than you need.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101189 More actions · Answer 13

Jeff Moden - Thursday, May 31, 2018 12:07 PM
Not sure why you use the words "obsessed" or "just" when it comes to page splits. If you don't consider those, then "overall db performance" will suffer and you'll waste a whole lot of resources especially when it comes to defragmenting to recover wasted page space, wasted memory, wasted clock cycles reading pages that are frequently more than 30% empty not to mention huge post defragmentation performance problems because of all that and the fact that bad pages splits cause 5.5 to 43 times more log file activity. And guess what you're doing while all that log file activity takes place? Waiting.
I'll also state that creating a Clustered Index based on the most common queries is a usually a performance mistake, especially on wider tables, because it's a bit like using SELECT * everywhere. You end up reading a heck of a lot more data than you need.

Just not true. Often good clus indexes are sequential anyway, even when they're not identities. For logging tables, it's usually best to use ( add_date ) or ( add_date, $identity ) because they are almost always searched by datetime. For order items, it's best to cluster by ( order_id, ident ), and since order_id is itself an ident, it's mostly in sequential order anyway. Similarly for intersection tables, it's best to use the keys of the related tables, as that's how it's looked up, rather than an identity.

Again, if you've got all overly-wide tables, you've done a poor design. Nothing on the back-end can ever completely fix that. Perhaps just slapping an identity on it and building gazillion covering indexes works just as well, although that's almost never been my experience, in 30+ years of tuning rdbms's.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004582 More actions · Answer 14

ScottPletcher - Thursday, May 31, 2018 12:16 PM
Jeff Moden - Thursday, May 31, 2018 12:07 PM
Not sure why you use the words "obsessed" or "just" when it comes to page splits. If you don't consider those, then "overall db performance" will suffer and you'll waste a whole lot of resources especially when it comes to defragmenting to recover wasted page space, wasted memory, wasted clock cycles reading pages that are frequently more than 30% empty not to mention huge post defragmentation performance problems because of all that and the fact that bad pages splits cause 5.5 to 43 times more log file activity. And guess what you're doing while all that log file activity takes place? Waiting.
I'll also state that creating a Clustered Index based on the most common queries is a usually a performance mistake, especially on wider tables, because it's a bit like using SELECT * everywhere. You end up reading a heck of a lot more data than you need.
Just not true. Often good clus indexes are sequential anyway, even when they're not identities. For logging tables, it's usually best to use ( add_date ) or ( add_date, $identity ) because they are almost always searched by datetime. For order items, it's best to cluster by ( order_id, ident ), and since order_id is itself an ident, it's mostly in sequential order anyway. Similarly for intersection tables, it's best to use the keys of the related tables, as that's how it's looked up, rather than an identity.
Again, if you've got all overly-wide tables, you've done a poor design. Nothing on the back-end can ever completely fix that. Perhaps just slapping an identity on it and building gazillion covering indexes works just as well, although that's almost never been my experience, in 30+ years of tuning rdbms's.

We'll continue to have a difference of opinion there. As for 30 years of experience, congrats but things and situations do change. Perhaps you and I have not had exposure to each other's situations. On that same note, I know lot's of people that have been rebuilding indexes in SQL Server for decades... ton's of experience doing so and will remind everyone of such things... the only problem is, they've been doing it incorrectly. Despite the decades of experience I have, "new" and "different" should not be ignored unless there is proof that they can be on an instance by instance basis.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101189 More actions · Answer 15

Jeff Moden - Thursday, May 31, 2018 3:06 PM
We'll continue to have a difference of opinion there. As for 30 years of experience, congrats but things and situations do change. Perhaps you and I have not had exposure to each other's situations. On that same note, I know lot's of people that have been rebuilding indexes in SQL Server for decades... ton's of experience doing so and will remind everyone of such things... the only problem is, they've been doing it incorrectly. Despite the decades of experience I have, "new" and "different" should not be ignored unless there is proof that they can be on an instance by instance basis.

Table types are generally common to some extent, even across industries. I think many people are in a fixed mode of "identity as clus key, add covering indexes as needed". The problem is those covers have to constantly be tweaked, every time a key query uses a new column. And you still could hit "the tipping point" and switch to a full table scan. And people overlook (or ignore) the total overhead of storing and maintaining all those indexes. And many times the main row has to be looked up anyway, so far from a savings, that approach hurts overall performance.

If you have too-wide tables, you can also split lesser-used columns into another table, 1-1 with the original, a view making this transparent. And/or force very long columns out of the main table into overflow (works well in only selected cases).

Another sign that you should carefully consider changing an identity clustering to something else is if you have another column, say column_c, and that column is the lead column in multiple nonclus indexes (with different includes, of course). Nonclus1 = c & a, d, nonclus4 = c & e, f, t, nonclus8 = c d & e l q r, etc. You should at least consider clustering on c instead.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

GUIDs as clustered index

Cookies on SQLServerCentral