May 6, 2010 at 2:00 pm
This is the most difficult question I've seen to date, and I've been on SSC a few years now. It's an excellent compilation of information, though, and I will use the question, explanation, and comments almost as a small study guide or textbook for the topic of clustered indexes.
Thanks,
webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
May 6, 2010 at 3:46 pm
Of 928 respondents only 21 have answered it correctly. A whopping 2% correct.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 6, 2010 at 4:03 pm
The part of me ready for trick questions missed the part about NCL having CL pointers in the IX pages. I thought "well, only the leaf level pages have pointers to the CL, so he's trying to trick us". Is my understanding of the data structure correct? Just looking for a clarification. Good question, though.
Thanks,
Eric
May 6, 2010 at 4:07 pm
Strommy (5/6/2010)
The part of me ready for trick questions missed the part about NCL having CL pointers in the IX pages. I thought "well, only the leaf level pages have pointers to the CL, so he's trying to trick us". Is my understanding of the data structure correct? Just looking for a clarification. Good question, though.Thanks,
Eric
Yes, that was the same thing that caused me to miss.
May 6, 2010 at 4:14 pm
Nadrek (5/6/2010)
This is basically a difference in terminology, and Microsoft's BOL entry is badly written.So, we have a table, with pages, and a clustered index:
Page 1 contains a-b
Page 2 contains b-t
Page 3 contains u-w
Page 4 contains w-z
All are full.
The pages that make up the table are in "clustered index order".
However, on disk (where "physical" starts to actually matter, particularly on rotating or tape media), if we have 2 data files for this database/filegroup and the data's split evenly, even assuming the table isn't fragmented and the files aren't fragmented (pretty unlikely without good planning and maintenance, or luck): we're likely to have something like:
Page 1 on Sector 500
Page 2 on Sector 800
Page 3 on Sector 501
Page 4 on Sector 801
If we have fragmentation at either OS (which SQL Server cannot fix) or SQL Server level, we might have:
Page 1 on Sector 999
Page 2 on Sector 232
Page 3 on Sector 555
Page 4 on Sector 77
You are forgetting page splits. If you fill up page 2 with "s" records and the next contiguous free page is after page 4 in the data file (page 5), then you get pointers between page 2 and 5, which now contains half of your [b-t] rows. Now your physical order is off, but the logical order is intact because of the pointers between page 2 and page 5.
Thanks,
Eric
May 6, 2010 at 5:02 pm
Good question. The only place where I disagree with author is location of clustered index columns inside the non-clustered indexes. They are located in leaf pages not in index (B-tree) pages.
Andrei
May 6, 2010 at 5:40 pm
Whoa! I returned to the computer after a few hours absence to find my mailbox almost overflowing with message notifications!
I won't respond to all comments individually. I read a lot of compliments, for which I thank you all.
I don't agree with the comments I read stating that 2% correct answers is good. When I submit a question, I want it to be difficult, but not almost impossible. I shoot for 20-30% correct on this type of "select all that apply" questions, a little more on the multiple choice questions since even blind guessing generates a significant percentage of correct answers on those. With hindsight, I should have divided the info in this question over two or three seperate questions; that would have been hard enough. As it is, I am glad Steve only gave 1 point to this question; otherwise it would have felt like cheating you out of your points.
I also hear a loud strong dislike of the "check all that apply" kind of questions. I'll try to use less of those and more of the "select the single correct answer" type questions for future QotD submissions.
Several people have rightfully commented on the "All nonclustered indexes include the clustered index columns in their index pages" option. I managed to include two errors in this single option.
The first one is caused by typing too fast and insufficient proofreading. I intended this option to refer to leaf pages. As several people have commented, the root and intermediate pages do not contain the clustered index columns, this is only the case on leaf pages.
The other error I made was that I did not think of heaps when adding this option (which is of course quite foolish, considering that I do mention them in the explanation). Now technically, one might say that if there is no clustered index then all clustered index columns (i.e., none) are added to the leaf pages. But that would be nitpicking and wordplay - exactly the kind of thing I hate with a vengeance in every kind of test, exam, or question.
My apologies to those who got tripped up or confused by my errors on this question.
I have also seen people quoting various fragments from Books Online that imply or even straightforward state that a clustered index does dictate the physical ordering of data on the disk. All I can see to that is that those Books Online is misleading or even wrong in those places. This is very easy to confirm by making a test database, creating a table, filling it in a way that ensures high fragmentation, and then using DBCC PAGE to check the contents of the actual database pages. I can't post a repro for this right now as it's already 1:30 AM where I live and I need to get some sleep, but I'll be happy to put someting together tomorrow if you guys want me to.
If I have missed any remark or question that you feel should be adressed, or if you have new questions or concerns, let me know. I'm going to sleep now, but I'll have more time for SQL Server Central tomorrow.
May 6, 2010 at 10:35 pm
An excellent question - or was it several questions all rolled into one?
Some extremely important concepts - unbelievably assigned just a single point! Madness.
I got this right, but I did pause over this one:
All nonclustered indexes include the clustered index columns in their index.
...not just because of the heap option but because the clustering keys will not be present in the non-leaf levels, if the non-clustered index is UNIQUE. The clustering key will still be at the leaf level - every index leaf row needs a row locator - so I decided to tick the box and was rewarded with a green tick and one point.
On the physical ordering thing, I would just add that individual rows on the page may not be in physical clustered index either - only the page offset array entries are guaranteed to be in logical and physical order.
Thanks Hugo.
May 6, 2010 at 10:56 pm
Juts wanted to add my comment about the physical order of rows in table.
In case when clustered index defined on table they are in "physical order" (as per BooksOnline), and this is true to some degree, as the word "physical" here does not mean that data in physical order as
a) rows inside the data page may be in different physical order
b) pages may not be in physical order inside the data file
c) the data file may be fragmented and not in physical order on the disk
d) the disk may be physically dispersed across a number of physical disks
and so on ...
This sequence may continue even further, but all this shows that any "physical" may happen to be logical if we look at this closer.
😉
Andrei.
May 7, 2010 at 2:00 am
UMG Developer (5/6/2010)
I missed it just because of the physical order part... They are physically stored in the clustered order, maybe just not contiguously/in sequence. 😉 The BOL docs don't help much.
Thats the part that I got wrong too. There isn't much on Physical order - we just have to infer it I think. One would really have to know about disk fragmentation to get that right I guess.
May 7, 2010 at 2:03 am
thanks for this wonderful question 🙂
May 7, 2010 at 3:22 am
Paul White NZ (5/6/2010)
On the physical ordering thing, I would just add that individual rows on the page may not be in physical clustered index either - only the page offset array entries are guaranteed to be in logical and physical order.
Paul, could you elaborate a bit more on this? BTW, thanks for the great articles on APPLY and Server-Side Paging.
Hugo, thanks for a great question, learned something new again.
May 7, 2010 at 4:03 am
Jan Van der Eecken (5/7/2010)
Paul White NZ (5/6/2010)
On the physical ordering thing, I would just add that individual rows on the page may not be in physical clustered index either - only the page offset array entries are guaranteed to be in logical and physical order.Paul, could you elaborate a bit more on this? BTW, thanks for the great articles on APPLY and Server-Side Paging.
Hugo, thanks for a great question, learned something new again.
You're welcome!
I'm not Paul, but I can elaborate as well. Suppose that a table has 500-byte rows, and a page is filled with 5 of them, that have been added in sequence of the clustered index key. Examining the page with DBCC PAGE will probably reveal a structure like this (for simplicity, I leave out the various page header fields and start counting bytes at the start of the actual row data).
Bytes 1 - 500: Row #1
Bytes 501 - 1000: Row #2
Bytes 1001 - 1500: Row #3
Bytes 1501 - 2000: Row #4
Bytes 2001 - 2500: Row #5
Bytes 2501 - 8000: unused
Last ten bytes of page: five 2-byte values representing the decimal values 2001, 1501, 1001, 501, 1.
These last 10 bytes are the "Row Offset Array". Reading from last to first, they tell SQL Server the start position of the first, second, third, fourth, and fifth row on the page.
Now I add a new row that, according to clustered index order, sits between #2 and #3 (let's call the now one Row #2.5). Instead of moving 1500 bytes to a different location on the page, SQL Server simply adds the new row in the unused space and moves only the 2-byte locators in the Row Offset Array. The new page layout will be:
Bytes 1 - 500: Row #1
Bytes 501 - 1000: Row #2
Bytes 1001 - 1500: Row #3
Bytes 1501 - 2000: Row #4
Bytes 2001 - 2500: Row #5
Bytes 2501 - 5000: Row #2.5
Bytes 3001 - 8000: unused
Last twelve bytes of page: six 2-byte values representing the decimal values 2001, 1501, 1001, 2501, 501, 1.
Reading the Row Offset Array backwards and locating the row, you get this order:
From position 1: Row #1
From position 501: Row #2
From position 2501: Row #2.5
From position 1001: Row #3
From position 1501: Row #4
From position 2001: Row #5
So the physical order of the Row Offset Array matches the logical order imposed by the clustered index; the physical order of rows in the page does not.
I hope this clarifies your question.
(And Paul - thanks for bringing up this excellent point).
May 7, 2010 at 4:31 am
Jan Van der Eecken (5/7/2010)
Paul, could you elaborate a bit more on this? BTW, thanks for the great articles on APPLY and Server-Side Paging.
Gosh thanks Jan! Haven't seen you around for a while, so good to see you again.
@hugo: Thank you for elaborating so clearly - saved me quite a lot of typing - win!!!
May 7, 2010 at 4:48 am
Thanks, Hugo, nicely explained.
Paul, I haven't been away, just been sitting quietly on the sidelines observing and learning.
Viewing 15 posts - 31 through 45 (of 54 total)
You must be logged in to reply to this topic. Login to reply