July 30, 2011 at 8:08 am
Hello - I know there's a lot of discussion about tables needing Clustered Indexes and when to use heaps, but a lot of the arguments I hear for Clustered Indexes are "It's standard practice" or "It follows protocol" but I haven't been able to find out why. So I'm trying to fill in puzzle pieces to further my understanding of heap vs clustered index.
so here are my questions:
1) What is the data structure of the IAM that holds all the rowid's for a heap? Is it just a singly linked list? Can this list get fragmented if massive updates/deletes occur?
2) If there is absolutely no need to do any sequential reads from a table, does it need a clustered index? My understanding is all non clustered indexes store the clustered index key, and thus once you do a non clustered index seek, you need to do a clustered index seek. However with a heap, all non clustered indexes point straight to the row. If a table does not ever need sequential rows read, isn't a heap the better performer?
3) If a clustered index is non unique, is a uniquifier (sp?) created on all key entries in the index or only the duplicates?
4) If I have a small static dimension table of 50 rows (i.e US states) that all together takes up less space than a page of data (say 2000 bytes), does it need an clustered index?
July 30, 2011 at 8:43 am
Gabriel P (7/30/2011)
1) What is the data structure of the IAM that holds all the rowid's for a heap? Is it just a singly linked list? Can this list get fragmented if massive updates/deletes occur?
An IAM is a page that contains a bitmap for extent allocations for an index. I assume you mean the pages themelves...
Heaps cannot become fragmented. Fragmentation require a logical order and heaps don't have one. Heaps instead get extent fragmentation and forwarding pointers.
2) If there is absolutely no need to do any sequential reads from a table, does it need a clustered index? My understanding is all non clustered indexes store the clustered index key, and thus once you do a non clustered index seek, you need to do a clustered index seek. However with a heap, all non clustered indexes point straight to the row. If a table does not ever need sequential rows read, isn't a heap the better performer?
All tables should have a clustered index, it's got nothing to do with sequential reads (nonclustered indexes are better for range scans anyway). It has to do with organisation of the table and what SQL is optimised for.
3) If a clustered index is non unique, is a uniquifier (sp?) created on all key entries in the index or only the duplicates?
Only the duplicates
4) If I have a small static dimension table of 50 rows (i.e US states) that all together takes up less space than a page of data (say 2000 bytes), does it need an clustered index?
I would still cluster it. If the table is one page only, it won't get anything else added, but still put a clustered index on (bearing in mind that all tables should have a primary key, even with your 1 page table, you will, from the pk, have either a clustered index or a nonclustered index. If you use a nonclustered, you're creating a second structure, hence a second page)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 30, 2011 at 9:31 am
GilaMonster (7/30/2011)
An IAM is a page that contains a bitmap for extent allocations for an index. I assume you mean the pages themelves...
I am visually trying to figure what object is between the object meta data and the actually row data. When I said IAM, I was referring to this picture
nonclustered indexes are better for range scans anyway
Can you point me to some material on this because I've been taught the opposite my entire career and my head is spinning in disbelief right now :hehe:
4) If I have a small static dimension table of 50 rows (i.e US states) that all together takes up less space than a page of data (say 2000 bytes), does it need an clustered index?
I would still cluster it. If the table is one page only, it won't get anything else added, but still put a clustered index on (bearing in mind that all tables should have a primary key, even with your 1 page table, you will, from the pk, have either a clustered index or a nonclustered index. If you use a nonclustered, you're creating a second structure, hence a second page)
Is this an answer based on best practices or best performance? I understand the best practices but this is still purely a performance based question. I'm pretty sure I can come up with an example involving a data dump that needs to be normalized where adding a primary key would not be beneficial. Your comment "It has to do with organisation of the table and what SQL is optimised for." is kind of the area I am looking for someone to point me in the direction of.
Please do not take this as me questioning your expertise (you've helped me on a number of occasions on these forums), I just am used to people telling me this should be done this way because "that's the way it is" or "best practices" and I'm trying to delve beneath the surface a little deeper to get a better understanding. That's all. 😀
July 30, 2011 at 10:21 am
Gabriel P (7/30/2011)
GilaMonster (7/30/2011)
An IAM is a page that contains a bitmap for extent allocations for an index. I assume you mean the pages themelves...
I am visually trying to figure what object is between the object meta data and the actually row data. When I said IAM, I was referring to this picture
That's an IAM page (Index Allocation Map), a bitmap for extent allocations for an index. One page for each GAM interval with one bit representing each extent in that interval, 1 if the extent is allocated to the index, 0 if it's not. IAM pages exist for heaps, clustered indexes and nonclustered indexes alike and they don't differ between them.
nonclustered indexes are better for range scans anyway
Can you point me to some material on this because I've been taught the opposite my entire career and my head is spinning in disbelief right now :hehe:
Sure. I don't, as a habit, make unfounded claims
http://sqlinthewild.co.za/index.php/2011/02/01/is-a-clustered-index-best-for-range-queries/
4) If I have a small static dimension table of 50 rows (i.e US states) that all together takes up less space than a page of data (say 2000 bytes), does it need an clustered index?
I would still cluster it. If the table is one page only, it won't get anything else added, but still put a clustered index on (bearing in mind that all tables should have a primary key, even with your 1 page table, you will, from the pk, have either a clustered index or a nonclustered index. If you use a nonclustered, you're creating a second structure, hence a second page)
Is this an answer based on best practices or best performance? I understand the best practices but this is still purely a performance based question.
Recommended practices (as in, what's generally best for most scenarios)
I'm pretty sure I can come up with an example involving a data dump that needs to be normalized where adding a primary key would not be beneficial. Your comment "It has to do with organisation of the table and what SQL is optimised for." is kind of the area I am looking for someone to point me in the direction of.
And if you have an example where you've tested and proven that doing something other than the recommended way is best, then go your own way. The general recommendations are there because they are generally best for most cases. If you have found an exception, great.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 30, 2011 at 10:28 am
Interesting article on the non clustered indexes. Thanks
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply