December 24, 2012 at 8:18 am
patrickmcginnis59 (12/24/2012)
DaleB (12/24/2012)
I find this hard to believe. If I have a table with 50 million rows, is SQL really going to move all those down one space to insert one row at the top?While this is probably not exactly how SQL server behaves, balanced tree indexes don't really have to shift all the rows around to put a new one at the logical beginning: http://en.wikipedia.org/wiki/B-tree
Exactly. So, a clustered index really is an index that makes the table "appear" to be physically ordered by the index value. The data is not actually physically ordered that way.
December 24, 2012 at 8:22 am
Exactly. So, a clustered index really is an index that makes the table "appear" to be physically ordered by the index value. The data is not actually physically ordered that way.
Right. It is physically ordered that way when the index is first created. Subsequent additions in the middle may make it only appear so, based on fill factor and what/how much you are adding.
Hakim Ali
www.sqlzen.com
December 24, 2012 at 8:25 am
hakim.ali (12/24/2012)
Exactly. So, a clustered index really is an index that makes the table "appear" to be physically ordered by the index value. The data is not actually physically ordered that way.
Right. It is physically ordered that way when the index is first created. Subsequent additions in the middle may make it only appear so, based on fill factor and what/how much you are adding.
However, if you ever rebuild your index, they will be physically (re)ordered by index value.
Hakim Ali
www.sqlzen.com
December 24, 2012 at 12:38 pm
I'll read up on this some more!
December 24, 2012 at 1:20 pm
Bob-444665 (12/24/2012)
The ordering of the data is not physically written or rewritten in some order when a clustered index is added or removed. This is a persistent myth on and off the internet. It would be a huge and wasteful IO operation.
You are very, very wrong here. When you first create a clustered index or rebuild a clustered index, the data is absolutely ordered physically by the key values. The ordering by keys is maintained only logically. If a new record is added, the record is added to the correct page that it would physically fall into, if there is enough space, but it is inserted in whatever space is large enough to hold it on the page and it's location noted in the slot array at the end of the page. If there is not enough space on the page it should be placed on, the page is split and approximately 50% of the data is placed into the new page (there are rare exceptions where the split is not ~50/50) and then the record inserted into whichever page it belongs.
Yes, it is a huge IO operation. No, it is not wasteful. The only additional work it has to do to make them physically ordered is the sort operation. This is why it is recommended that you have approximately 2.5 times the size of the table for creating or rebuilding a clustered index. 1X for the existing version of it + 1X for the new version of it + .5X for sorting operations.
December 24, 2012 at 3:12 pm
roger.plowman (12/24/2012)
I always liked the library analogy.The clustered index is how the books are shelved (usually by Dewey Decimal number), the non-clustered indexes are the index cards in the card catalogs.
Of course this probably tells you I'm old... (laughing)
Extending this, covering indexes are having the information you're looking for in the card catalog so you don't have to go looking on the shelf. 😀
This is how I always explain it to people.
December 25, 2012 at 8:22 am
As Robert mentioned earlier, data on a page isn't necessarily stored in clustering order, it's just on the right page. This means that a table scan without an ORDER BY clause won't necessarily be in clustering order, another time this happens is when you have Enterprise edition and have concurrent scans running, the second scan will piggy back on the first scan then backtrack to the first portion it missed. I can't remember if this applies to all versions of SQL Server or not.
I come from a DB2 mainframe background where the clustering index was separate from the data, no criticism of the original post, this is a SQL Server forum after all.
Can't believe I'm posting on Christmas day after a bottle of wine. Merry Christmas everyone!
December 25, 2012 at 8:52 am
colin.s.allen (12/25/2012)
... another time this happens is when you have Enterprise edition and have concurrent scans running, the second scan will piggy back on the first scan then backtrack to the first portion it missed. I can't remember if this applies to all versions of SQL Server or not.
You are correct. We call it a merry-go-round scan, and it is an Enterprise-only optimization.
Other reasons data may not be returned ordered include parallel threads, and in certain situations, SQL Server may opt to do a scan in physical order rather than logical order. For example, if there were three pages in physical and logical order and the middle one did a page split putting the new page immediately after the 3rd one, SQL could read the pages in logical order (1 -> 2 -> 4 -> 3) or physical order (1 -> 2 -> 3 -> 4) . This is an optimization because it can read contiguous pages in a single read. So for the above example, the logical order would be 3 reads because the order would make the pages not contiguous, but the physical order could be read in just 1 read because they are contiguous.
I hope that made sense and didn't confuse people further.
colin.s.allen (12/25/2012)
Can't believe I'm posting on Christmas day after a bottle of wine. Merry Christmas everyone!
Merry Christmas (or whatever you celebrate, if anything) to all!
December 26, 2012 at 12:12 am
To say a table "is" a clustered index is specious, at best. I suppose you might approximate truth by saying the table "is" the leaf nodes of a clustered index, but that's a verbal contortion and not worth the distinction.
I'm fine with saying a table "has" a clustered index, just like it "has" a nonclustered index. Or with saying a table "with" a clustered index, which pretty much means the same thing.
Also, let's not forget that neither index as really table-like. Both are
tree-structured and nothing at all like a phone book.
December 26, 2012 at 5:54 am
Dennis Q Miller (12/26/2012)
To say a table "is" a clustered index is specious, at best. I suppose you might approximate truth by saying the table "is" the leaf nodes of a clustered index, but that's a verbal contortion and not worth the distinction.I'm fine with saying a table "has" a clustered index, just like it "has" a nonclustered index. Or with saying a table "with" a clustered index, which pretty much means the same thing.
Also, let's not forget that neither index as really table-like. Both are
tree-structured and nothing at all like a phone book.
+1.
Nice article, Hakim :-).
However I am fine with the terms "table [has] a clustered index". A table is always a table no matter it has any index on it or not. Why I would not like to call the table as "clustered index" itself because it will create further confusion.
For instance, one can ask what if I drop the clustered index ? Will it drop the table as "clustered index is the table"? The answer will be no, dropping a "clustered index" won't drop the "table"!
Therefore, In order to avoid all this confusion I would like to call it as table "has" those indexes.
yep, to clear the difference between Clustered Index & Heap, I'd like to specify that table can remain in one of the two states:
1. Ordered
2. Unordered
If it is ordered (when you have created a clustered index on it) then it is a clustered index.
If it is unordered then it is a heap.
December 26, 2012 at 7:18 am
There's an interesting script written by Dan Guzman here which demonstrates that the physical ordering is not necessarily the same as the clustered index logical ordering.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
December 26, 2012 at 7:40 am
Robert Davis (12/25/2012)
colin.s.allen (12/25/2012)
... another time this happens is when you have Enterprise edition and have concurrent scans running, the second scan will piggy back on the first scan then backtrack to the first portion it missed. I can't remember if this applies to all versions of SQL Server or not.You are correct. We call it a merry-go-round scan, and it is an Enterprise-only optimization.
Thanks for this info. Is there somewhere I can research more the merry-go-round scan in Enterprise? Does it show up at all in the query optimizer?
December 26, 2012 at 8:02 am
Dennis Q Miller (12/26/2012)
To say a table "is" a clustered index is specious, at best. I suppose you might approximate truth by saying the table "is" the leaf nodes of a clustered index, but that's a verbal contortion and not worth the distinction. I'm fine with saying a table "has" a clustered index, just like it "has" a nonclustered index. Or with saying a table "with" a clustered index, which pretty much means the same thing.Also, let's not forget that neither index as really table-like. Both are tree-structured and nothing at all like a phone book.
I don't disagree with any of this, you are right of course. My article was never meant to be an in-depth look at index structures (there is a very good 'Stairways' series on it right here on SQLServerCentral, highly recommended). Rather, it was meant to be a simple and easy to understand explanation for those new to the subject.
Hakim Ali
www.sqlzen.com
December 26, 2012 at 8:11 am
Divine Flame (12/26/2012)
Nice article, Hakim :-).
However I am fine with the terms "table [has] a clustered index". A table is always a table no matter it has any index on it or not. Why I would not like to call the table as "clustered index" itself because it will create further confusion.
For instance, one can ask what if I drop the clustered index ? Will it drop the table as "clustered index is the table"? The answer will be no, dropping a "clustered index" won't drop the "table"!
Therefore, In order to avoid all this confusion I would like to call it as table "has" those indexes.
yep, to clear the difference between Clustered Index & Heap, I'd like to specify that table can remain in one of the two states:
1. Ordered
2. Unordered
If it is ordered (when you have created a clustered index on it) then it is a clustered index.
If it is unordered then it is a heap.
Thanks. You are right. Part of the reason I wrote this, was in response to an interview question where a candidate said "it is not too expensive to put multiple clustered indexes on a table." The concept so completely eluded this person, and a few others I have spoken to, that I wanted to demonstrate that a table and its clustered index are very closely aligned. Hence the impetus of my argument. I know there are differences, I know that the table only is the leaf nodes (like Dennis Miller pointed out). If you were trying to explain all this to somebody like my interview candidate, it would probably not work. Thus, the simplification as step one. Never meant for it to be misleading, just simple enough to grasp.
Hakim Ali
www.sqlzen.com
December 26, 2012 at 4:40 pm
Thank you for providing the article. It does provide a simple explanation for people new to the subject. But I think it should be pointed out (preferrably in the article) that you are simplifying a gret deal to make it easy to understand.
As Vadivel points out it is not completely true to say that the records of a table with a clustered index are not always in the order of the clustered index. Itzek Ben-Gan went further and said that it was a myth that the records were perfectly sorted in the order of the clustered index.
Of course, it is a useful myth and your description is probably a fine place for beginners to start, but they should know that it is a simplification and not be surprised when they find out the reality is more complicated.
---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
Viewing 15 posts - 31 through 45 (of 53 total)
You must be logged in to reply to this topic. Login to reply