February 14, 2011 at 10:43 pm
Comments posted to this topic are about the item Understanding Indexes
February 14, 2011 at 10:50 pm
Two points I'm going to disagree with
Avoid the creation of an index on a column in a table when the data stored in that column has a large number of repeated data values are stored in the column.
Low density indexes can be useful, especially if they are covering indexes
If the database supports a transaction processing system, then avoid using indexes because the performance of the insertion of new transactions will be degraded.
Absolutely not. An OLTP system needs small numbers of well-chosen, effective indexes. None at all means terrible select, update and delete performance. The inserts may be fast, but I've never seen a write-only database.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 15, 2011 at 3:29 am
Plus, inserts in to a heap are not necessarily fastest:
February 15, 2011 at 3:45 am
At this point of time, I am expecting more from this Article like Unique Index, Filtered Index etc..
but 🙁
February 15, 2011 at 5:34 am
# If the database supports a transaction processing system, then avoid using indexes because the
# performance of the insertion of new transactions will be degraded.
I disagree with this statement. I manage a reasonably good-sized OLTP system. I wouldn't even want to think about trying to manage it without good indexing.
Inserts might be a bit faster without indexes (I'm not going to argue that one too much, but some of my testing shows that inserts into heaps aren't always faster than inserts into indexed tables) - but in a multi-user system with applications and users requesting information from the same tables that those inserts are running against, the performance hits to NOT index those tables would be awful.
-Ki
February 15, 2011 at 5:39 am
Define a clustered index on the column that appears most frequently in the WHERE clause of SELECT
statements.
But not in JOIN criteria? No, no way. The basic rule for clusters should be the most frequently used access path to the data. This may be primary keys, this may be foreign keys, or it might be simply search criteria, but I wouldn't suggest limiting it to WHERE clauses.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 15, 2011 at 5:42 am
Kiara (2/15/2011)
# If the database supports a transaction processing system, then avoid using indexes because the
# performance of the insertion of new transactions will be degraded.
I disagree with this statement. I manage a reasonably good-sized OLTP system. I wouldn't even want to think about trying to manage it without good indexing.
Inserts might be a bit faster without indexes (I'm not going to argue that one too much, but some of my testing shows that inserts into heaps aren't always faster than inserts into indexed tables) - but in a multi-user system with applications and users requesting information from the same tables that those inserts are running against, the performance hits to NOT index those tables would be awful.
Yeah, that's probably the most dangerous statement in the article. There are some excellent tests that show that heaps don't perform better for inserts. And they certainly don't for deletes or updates.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 15, 2011 at 5:50 am
Yeah, that's probably the most dangerous statement in the article. There are some excellent tests that show that heaps don't perform better for inserts. And they certainly don't for deletes or updates.
I choked on my coffee when I read the OP's statement - enough to actually drop out of lurker mode and say something. 🙂 Considering the amount of time I spent when I started in this position dealing with index management to speed up database performance in an OLTP database, I was really, really surprised to see that in what appears to be an academic treatise.
-Ki
February 15, 2011 at 6:04 am
Kiara (2/15/2011)
Considering the amount of time I spent when I started in this position dealing with index management to speed up database performance in an OLTP database, I was really, really surprised to see that in what appears to be an academic treatise.
The academic nature of the article, along with the presentation as absolute fact were the two things that worried me. I do a lot of writing about indexes and I'm very careful to test anything that I claim and not to make absolute statements.
There's no figures, no indication that any testing was done, and several of the 'facts' are myths that I see repeated again and again.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 15, 2011 at 7:03 am
GilaMonster (2/15/2011)
The academic nature of the article, along with the presentation as absolute fact were the two things that worried me. I do a lot of writing about indexes and I'm very careful to test anything that I claim and not to make absolute statements.There's no figures, no indication that any testing was done, and several of the 'facts' are myths that I see repeated again and again.
Completely agreed. Your posts are based in practice, testing, and experience - and you can back up what you write about. (Yes, I have a couple of them bookmarked. 🙂 )
I'd love to hear back from the OP on the actual reason that article was written and who the original intended audience was.
(Edited to fix formatting)
-Ki
February 15, 2011 at 7:30 am
Gail and Grant (or anyone else that can jump in),
The article also has the following statements:
...[font="Arial Black"]whereas, with a nonclustered index the leaf level contains the address to the data pages contained in the search.[/font] Simply put, a clustered index is stored and sorted with the data in the clustered index key order. [font="Arial Black"]While the nonclustered index may be thought of as a pointer to the location of the data to which it refers.[/font]
I believe, from what you've shown me with DBCC IND, that the bolded statements in the above quote from the article are just not correct. To wit, if they were correct, even a covering index would cause "BookMark Lookups". Since I'm not an expert on indexes that you good folks are, can you confirm that the bolded statements in the above quote are, depending on whether it's a covering index or not, incorrect... or not? I realize that data is stored in the index but that data is actually used if it's a covering index.
Or... perhaps I'm just being a bit too critical of an introductory article.
{edit} And, yes, I realize that either the clustered index or an internal row number (for heaps) is used in an index which may be thought of as an "address" of sorts but the statements in the quote make it sound as if only address information is stored or that the lookup data stored cannot be used directly.
Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 15, 2011 at 7:46 am
Grant Fritchey (2/15/2011)
Define a clustered index on the column that appears most frequently in the WHERE clause of SELECT
statements.
But not in JOIN criteria? No, no way. The basic rule for clusters should be the most frequently used access path to the data. This may be primary keys, this may be foreign keys, or it might be simply search criteria, but I wouldn't suggest limiting it to WHERE clauses.
Again, I'm not the "index Ninja" here, but I've also found that a table that suffers huge numbers of inserts can make very good use of a Clustered Index on an auto-numbering column such as an IDENTITY column or, perhaps, a date column to keep page splits to a reasonable level whether or not that column is the most frequently used in WHERE or JOIN criteria.
Have any of you good folks experienced the same or is there some other practice that folks use on a high insertion rate table?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 15, 2011 at 8:06 am
"Select columns that are unique such as an IDENTITY column or contain many distinct values." ???
Really? Does the fact that a column is an identity column really qualify it for an index if the identity is never used in a query? This is an honest question, NOT sarcasm. I certainly have a lot to learn about indexes still. But I ask this because I recently spent some time with a customer that was looking to upgrade his storage because of what he perceived were disk contention issues. I came in with an eye to look at tuning the SQL environment before recommending any hardware or software upgrades and I found that he had followed the advice of a SQL blog that recommended making all of his clustered indexes on the primary keys (which all happened to be IDENTITY columns), none of which were ever mentioned in queries to the database, which I know might be unusual, but this data was only used for 2 import processes into what was essentially a mostly read environment (I know that is not normal and we are discussing warehousing) which never called these specific ID columns after import.
The point I am making is that we need to be careful about the reasons we choose columns to index on and avoid focusing on a SINGLE criteria, like "it's an IDENTITY column". These statements should have caveats and qualifiers, like "index on an IDENTITY column or unique column IF you use it as part of a query. If you never or rarely query against a column, you may never see a benefit, maybe only add overhead for no good reason". The result of removing these indexes from these ID columns and only adding a couple of indexes on data that was actually queried resulted in major performance improvements and refocused the equipment purchase.
And a statement like ...
"Conceptually, one can consider an index to be a data structure containing pairs ((a1,...,an), b), where (a1,c,an) is the tuple of values for the attributes A1,c,An and b is the address of the tuples in r with these values, that is, a pointer to the set ,c, " (Mannila & Raiha, 1992)"
...is not very helpful in a level 100 post for "a junior DBA". 🙂
And if you want to describe the use of B-trees, maybe explaining what a B-tree is would be helpful. 🙂 Even though most of us who post and read here may have a strong grasp, remember the intended audience.
But thanks for bringing it up. Indexes are a very important part of what we do BUT let me add, STATISTICS maybe more important. For level 300 or 400 people you might enjoy this from the MCM series:
http://technet.microsoft.com/en-us/sqlserver/ff977043.aspx
I highly recommend even junior DBA's review some of these videos. They give insight into some topics that may be a little too advanced, but anyone of any level can get some good info.
Thanks for the post Ron! 😀
Peter Trast
Microsoft Certified ...(insert many literal strings here)
Microsoft Design Architect with Alexander Open Systems
February 15, 2011 at 8:14 am
Jeff Moden (2/15/2011)
Grant Fritchey (2/15/2011)
Define a clustered index on the column that appears most frequently in the WHERE clause of SELECT
statements.
But not in JOIN criteria? No, no way. The basic rule for clusters should be the most frequently used access path to the data. This may be primary keys, this may be foreign keys, or it might be simply search criteria, but I wouldn't suggest limiting it to WHERE clauses.
Again, I'm not the "index Ninja" here, but I've also found that a table that suffers huge numbers of inserts can make very good use of a Clustered Index on an auto-numbering column such as an IDENTITY column or, perhaps, a date column to keep page splits to a reasonable level whether or not that column is the most frequently used in WHERE or JOIN criteria.
Have any of you good folks experienced the same or is there some other practice that folks use on a high insertion rate table?
I'm not an "index Niinja" either. I leave that to Gail, but in my experience and with my understanding, you're 100% right on here. Again, I like my own definition of "most frequently used access path" because that can apply to inserts as well as updates, deletes or selects. It doesn't matter. Since the cluster holds the data, you need to take advantage of that fact in whatever way is most advantageous to the system you're working on.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 15, 2011 at 8:17 am
Jeff Moden (2/15/2011)
Gail and Grant (or anyone else that can jump in),The article also has the following statements:
...[font="Arial Black"]whereas, with a nonclustered index the leaf level contains the address to the data pages contained in the search.[/font] Simply put, a clustered index is stored and sorted with the data in the clustered index key order. [font="Arial Black"]While the nonclustered index may be thought of as a pointer to the location of the data to which it refers.[/font]
I believe, from what you've shown me with DBCC IND, that the bolded statements in the above quote from the article are just not correct. To wit, if they were correct, even a covering index would cause "BookMark Lookups". Since I'm not an expert on indexes that you good folks are, can you confirm that the bolded statements in the above quote are, depending on whether it's a covering index or not, incorrect... or not? I realize that data is stored in the index but that data is actually used if it's a covering index.
Or... perhaps I'm just being a bit too critical of an introductory article.
{edit} And, yes, I realize that either the clustered index or an internal row number (for heaps) is used in an index which may be thought of as an "address" of sorts but the statements in the quote make it sound as if only address information is stored or that the lookup data stored cannot be used directly.
Thanks.
I think what he means there is that the rest of the data, that which is not a part of the nonclustered key or added by the INCLUDE statement, is stored at the clustered index and a pointer back to the cluster is stored at the leaf of the nonclustered index. I'm pretty sure that's what he meant, but the way it's written, I think it is subject to the interpretation you're giving it. He's basically right on this one, but it's phrased a bit awkwardly. At least that's my opinion on it.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 15 posts - 1 through 15 (of 26 total)
You must be logged in to reply to this topic. Login to reply