December 19, 2013 at 8:04 am
Jeff Moden (12/18/2013)
ScottPletcher (12/18/2013)
If your lookups will most often be by column2, you need to cluster the table on column2, or an encoded version of it, rather than having a dopey, useless cluster on column1.Since column2 is varchar(10), you could encode it into an integer and then cluster on that. Your lookups then become something like:
WHERE column2_code = (SELECT column2_code FROM dbo.column2_values WHERE column2_value = 'xxx')
You can keep column1 as the PK if you really, really want to, although it's meaningless really.
I wouldn't cluster based on Column2 because it doesn't follow the best recommendation of being narrow, unique, and ever-increasing. If you cluster on Column2, you're just asking for page splits.
That's why rebuilds, and in specific cases, fillfactor, exist.
Yes, column2 doesn't follow the general guideline -- which is NOT supposed to be an absolute rule -- that so many people foolishly follow slavishly, but it will perform vastly better.
Prabhu:
If you want the best overall performance, ignore overly simplistic "rules" and choose the best clustered index for your specific table's needs. Leave using nursery rhymes to do database design to others.
How about this? Try it on this table with the queries you've shown. You should see massive improvement across the board.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 19, 2013 at 1:49 pm
ScottPletcher (12/19/2013)
Jeff Moden (12/18/2013)
ScottPletcher (12/18/2013)
If your lookups will most often be by column2, you need to cluster the table on column2, or an encoded version of it, rather than having a dopey, useless cluster on column1.Since column2 is varchar(10), you could encode it into an integer and then cluster on that. Your lookups then become something like:
WHERE column2_code = (SELECT column2_code FROM dbo.column2_values WHERE column2_value = 'xxx')
You can keep column1 as the PK if you really, really want to, although it's meaningless really.
I wouldn't cluster based on Column2 because it doesn't follow the best recommendation of being narrow, unique, and ever-increasing. If you cluster on Column2, you're just asking for page splits.
That's why rebuilds, and in specific cases, fillfactor, exist.
Yes, column2 doesn't follow the general guideline -- which is NOT supposed to be an absolute rule -- that so many people foolishly follow slavishly, but it will perform vastly better.
Prabhu:
If you want the best overall performance, ignore overly simplistic "rules" and choose the best clustered index for your specific table's needs. Leave using nursery rhymes to do database design to others.
How about this? Try it on this table with the queries you've shown. You should see massive improvement across the board.
I think that "narrow, unique, and ever-increasing" has become a rule of thumb for a selecting a clustering key for *very* good reasons. I also think that the advice to "choose a column that will most frequently be used in query conditions as the clustering key" omits many important considerations. A query will generally perform better with a well-constructed covering index because the rows of the index will be narrower than the clustered index (or heap) rows, meaning that more of them will fit on a data page and fewer data pages will have to be read to return the same number of rows. Offering rebuilds and fillfactors as a solution to page split issues reflects an incomplete analysis. Rebuilds and lower fillfactors involve significant overhead in processing resources and storage space that must be weighed in the balance. I think it borders on irresponsible to advise someone to disregard widely recognized best practices so cavalierly.
I think that the OP should start with strategies that have been proven time and time again - choose a clustering key that facilitates inserts and allows for efficient management of storage space. Tune queries, then create covering indexes to serve frequently run queries that still perform unacceptably. Finally, weigh the improvement in query performance against the costs of maintaining the indexes (most commonly, increased storage space and degradation of performance for inserts, updates, and deletes) and decide whether the balance is favorable or unfavorable, a determination that depends heavily upon the specific circumstances. When that balance is determined to be unfavorable, it may be appropriate to consider unconventional approaches, but those approaches should be rigorously evaluated and accepted only if they provide benefits in excess of their drawbacks and after widely recognized best practices prove inadequate.
Jason Wolfkill
December 19, 2013 at 2:03 pm
wolfkillj (12/19/2013)
ScottPletcher (12/19/2013)
Jeff Moden (12/18/2013)
ScottPletcher (12/18/2013)
If your lookups will most often be by column2, you need to cluster the table on column2, or an encoded version of it, rather than having a dopey, useless cluster on column1.Since column2 is varchar(10), you could encode it into an integer and then cluster on that. Your lookups then become something like:
WHERE column2_code = (SELECT column2_code FROM dbo.column2_values WHERE column2_value = 'xxx')
You can keep column1 as the PK if you really, really want to, although it's meaningless really.
I wouldn't cluster based on Column2 because it doesn't follow the best recommendation of being narrow, unique, and ever-increasing. If you cluster on Column2, you're just asking for page splits.
That's why rebuilds, and in specific cases, fillfactor, exist.
Yes, column2 doesn't follow the general guideline -- which is NOT supposed to be an absolute rule -- that so many people foolishly follow slavishly, but it will perform vastly better.
Prabhu:
If you want the best overall performance, ignore overly simplistic "rules" and choose the best clustered index for your specific table's needs. Leave using nursery rhymes to do database design to others.
How about this? Try it on this table with the queries you've shown. You should see massive improvement across the board.
I think that "narrow, unique, and ever-increasing" has become a rule of thumb for a selecting a clustering key for *very* good reasons. I also think that the advice to "choose a column that will most frequently be used in query conditions as the clustering key" omits many important considerations. A query will generally perform better with a well-constructed covering index because the rows of the index will be narrower than the clustered index (or heap) rows, meaning that more of them will fit on a data page and fewer data pages will have to be read to return the same number of rows. Offering rebuilds and fillfactors as a solution to page split issues reflects an incomplete analysis. Rebuilds and lower fillfactors involve significant overhead in processing resources and storage space that must be weighed in the balance. I think it borders on irresponsible to advise someone to disregard widely recognized best practices so cavalierly.
I think that the OP should start with strategies that have been proven time and time again - choose a clustering key that facilitates inserts and allows for efficient management of storage space. Tune queries, then create covering indexes to serve frequently run queries that still perform unacceptably. Finally, weigh the improvement in query performance against the costs of maintaining the indexes (most commonly, increased storage space and degradation of performance for inserts, updates, and deletes) and decide whether the balance is favorable or unfavorable, a determination that depends heavily upon the specific circumstances. When that balance is determined to be unfavorable, it may be appropriate to consider unconventional approaches, but those approaches should be rigorously evaluated and accepted only if they provide benefits in excess of their drawbacks and after widely recognized best practices prove inadequate.
Those are NOT NOT NOT "best practices".
The best practice is to carefully and accurately select the best clustering key for each table based on that specific table's needs, not to blindly follow a nursery rhyme.
Being forced to building covering indexes for all/most queries doesn't make an identity cluster a good idea, it proves it's often a very bad one. Aside from the inherent overhead of all those covering indexes -- which, btw, must be fragmented themselves, since they have the same lead key -- most companies just don't have the resources to do the continual, forced monitoring of queries to avoid sudden and frustrating debilitating performance hits as a new column is added to a query and the query then requires a full table scan, since the index is no longer covering. Yes, if you have unlimited manpower to spend hours going thru every SQL query in complete detail ahead of deployment you can avoid most of that, but it's very difficult. For a huge company with developers scattered across the world working on too-tight schedules, it's just impossible. Sufficient time has never been allowed for that at any company I've been at, and I've been a full-time DBA for 25 years.
Yes, rules of thumb exist because they are effective overall. An expert is someone who knows when to override them.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 19, 2013 at 2:29 pm
I had to create an index that added all the columns in the include of the index. This was because i created a function that has to return all the columns from several table. In that instance i suppose adding all the columns to the include isnt a bad idea however i'm not an expert.
December 19, 2013 at 2:44 pm
Snargables (12/19/2013)
I had to create an index that added all the columns in the include of the index. This was because i created a function that has to return all the columns from several table. In that instance i suppose adding all the columns to the include isnt a bad idea however i'm not an expert.
Again, you would not have to do that if you just cluster the table properly.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 20, 2013 at 1:12 am
Those are NOT NOT NOT "best practices".
The best practice is to carefully and accurately select the best clustering key for each table based on that specific table's needs, not to blindly follow a nursery rhyme.
Being forced to building covering indexes for all/most queries doesn't make an identity cluster a good idea, it proves it's often a very bad one. Aside from the inherent overhead of all those covering indexes -- which, btw, must be fragmented themselves, since they have the same lead key -- most companies just don't have the resources to do the continual, forced monitoring of queries to avoid sudden and frustrating debilitating performance hits as a new column is added to a query and the query then requires a full table scan, since the index is no longer covering. Yes, if you have unlimited manpower to spend hours going thru every SQL query in complete detail ahead of deployment you can avoid most of that, but it's very difficult. For a huge company with developers scattered across the world working on too-tight schedules, it's just impossible. Sufficient time has never been allowed for that at any company I've been at, and I've been a full-time DBA for 25 years.
Yes, rules of thumb exist because they are effective overall. An expert is someone who knows when to override them.
Yes, Scott,
I agree with you, its a very meaningful Explanation.
I too don't feel good in adding all the columns in Include Option.
Thanks for your time and valueable comments,
Prabhu
December 21, 2013 at 12:19 pm
ScottPletcher (12/19/2013)
wolfkillj (12/19/2013)
ScottPletcher (12/19/2013)
Jeff Moden (12/18/2013)
ScottPletcher (12/18/2013)
If your lookups will most often be by column2, you need to cluster the table on column2, or an encoded version of it, rather than having a dopey, useless cluster on column1.Since column2 is varchar(10), you could encode it into an integer and then cluster on that. Your lookups then become something like:
WHERE column2_code = (SELECT column2_code FROM dbo.column2_values WHERE column2_value = 'xxx')
You can keep column1 as the PK if you really, really want to, although it's meaningless really.
I wouldn't cluster based on Column2 because it doesn't follow the best recommendation of being narrow, unique, and ever-increasing. If you cluster on Column2, you're just asking for page splits.
That's why rebuilds, and in specific cases, fillfactor, exist.
Yes, column2 doesn't follow the general guideline -- which is NOT supposed to be an absolute rule -- that so many people foolishly follow slavishly, but it will perform vastly better.
Prabhu:
If you want the best overall performance, ignore overly simplistic "rules" and choose the best clustered index for your specific table's needs. Leave using nursery rhymes to do database design to others.
How about this? Try it on this table with the queries you've shown. You should see massive improvement across the board.
I think that "narrow, unique, and ever-increasing" has become a rule of thumb for a selecting a clustering key for *very* good reasons. I also think that the advice to "choose a column that will most frequently be used in query conditions as the clustering key" omits many important considerations. A query will generally perform better with a well-constructed covering index because the rows of the index will be narrower than the clustered index (or heap) rows, meaning that more of them will fit on a data page and fewer data pages will have to be read to return the same number of rows. Offering rebuilds and fillfactors as a solution to page split issues reflects an incomplete analysis. Rebuilds and lower fillfactors involve significant overhead in processing resources and storage space that must be weighed in the balance. I think it borders on irresponsible to advise someone to disregard widely recognized best practices so cavalierly.
I think that the OP should start with strategies that have been proven time and time again - choose a clustering key that facilitates inserts and allows for efficient management of storage space. Tune queries, then create covering indexes to serve frequently run queries that still perform unacceptably. Finally, weigh the improvement in query performance against the costs of maintaining the indexes (most commonly, increased storage space and degradation of performance for inserts, updates, and deletes) and decide whether the balance is favorable or unfavorable, a determination that depends heavily upon the specific circumstances. When that balance is determined to be unfavorable, it may be appropriate to consider unconventional approaches, but those approaches should be rigorously evaluated and accepted only if they provide benefits in excess of their drawbacks and after widely recognized best practices prove inadequate.
Those are NOT NOT NOT "best practices".
The best practice is to carefully and accurately select the best clustering key for each table based on that specific table's needs, not to blindly follow a nursery rhyme.
Being forced to building covering indexes for all/most queries doesn't make an identity cluster a good idea, it proves it's often a very bad one. Aside from the inherent overhead of all those covering indexes -- which, btw, must be fragmented themselves, since they have the same lead key -- most companies just don't have the resources to do the continual, forced monitoring of queries to avoid sudden and frustrating debilitating performance hits as a new column is added to a query and the query then requires a full table scan, since the index is no longer covering. Yes, if you have unlimited manpower to spend hours going thru every SQL query in complete detail ahead of deployment you can avoid most of that, but it's very difficult. For a huge company with developers scattered across the world working on too-tight schedules, it's just impossible. Sufficient time has never been allowed for that at any company I've been at, and I've been a full-time DBA for 25 years.
Yes, rules of thumb exist because they are effective overall. An expert is someone who knows when to override them.
While I absolutely agree that just blindly implementing anything is not a best practice, especially when it comes to the Clustered Index of a table, I've found that it's usually "just" crap code written in a thoughtless, all-in-one, monolithic fashion without an understanding of the data itself that require things like multiple covering indexes in order to get any performance out of them.
With the understanding that I absolutely agree that blindly following any best practice can lead to problems in many cases, the "nursery rhyme" of "narrow, unique, not NULLable, fixed width, static, and ever increasing" for a Clustered Index actually IS a best practice for more reasons than many people are aware of. Do notice that I didn't say that should necessarily be an IDENTITY column. And, again, I want to stress that I agree that best practices should be a starting point of consideration and not the end all to be all.
To stress the point that Scott makes, "Knowledge" can be thought of as knowing that a tomato is a fruit and not a vegetable. "Wisdom" is knowing that it probably shouldn't be used in a fruit salad. 😀
With that good thought in mind, here are some links to some "Microsoft Master" movies produced by Kimberly Tripp, a former MS employee that supported many of the things having to do with the optimizer and other things having to do with indexing (dunno if she wrote code for it or not). The movies are well worth the viewing from start to finish because they provide a huge amount of information that many people are unaware of when selecting what the Clustered Index should be and why.
http://technet.microsoft.com/en-us/sqlserver/gg508878
http://technet.microsoft.com/en-US/sqlserver/gg508879.aspx
To give you some incentive to take the time to watch the movies, consider the following. Some of the things that people might not be aware of is that the keys of the Clustered Index (CI) are automatically added to the Non-Clustered Indexes (NCI) if they're not declared as part of the NCI. For example, if you have a 2 column NCI and a 5 column CI, the leaf level of the NCI will actually contain all 7 columns for all rows in the table. If the CI is not unique, then there are other columns and bytes of overhead that will be added to the B-Tree and Leaf Level making a "simple" 2 column index a monster on the disk. Of course, the wider an index, the slower it will run because there will be fewer rows per page. There are a ton of other "gotcha's" covered in both of the movies above. I strongly recommend that anyone who reads this watch them and study them.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply