October 25, 2011 at 7:49 am
I've got a big table with millions of records. It doesnt have any unique key and the uniqey key is 70 character long. I assume it is not good idea to make that cluster index. Now if i've auto increment number and have it as clustered index how is it going to help with my query retrival?
The index data is stored in physical order but that auto id is not going to help with my retrival because i dont use that auto id at all in any of my query?
I agree with non cluster that it will help other columns to join on and you can retrive data quickly but i dont understand how auto id clustered index can help? if i dont have auto id clustered index then i cant have any other clustered index because they are not unique.
I read lot of things about index and i understand B-tree, leaf level, pointer, pages, split, fragmentation but still it doesnt make sense to me about the choice of index on certain table.
October 25, 2011 at 9:24 am
The clustered index is the physical order, so it helps in terms of ordering the data back if the table is scanned. So typically I try to cluster on columns that will be used in range scans (like dates) or those columns that will not cause page splits.
So GUIDs aren't great because the inserts can be random in the table, causing lots of page splits. An identity column doens't give you that for inserts. Does it help you with queries? Not necessarily.
The clustered index should be picked to minimize the page splits while also handling range queries (or aggregates) if they're appropriate on the table. In general, you hope that your NC indexes handle most of the load, which might be included columns in the index.
October 25, 2011 at 9:30 am
Clustered index = logical order. Not physical order. Must blog.
The clustered index should be picked to minimize the page splits while also handling range queries (or aggregates) if they're appropriate on the table. In general, you hope that your NC indexes handle most of the load, which might be included columns in the index.
Absolutely (though covering nonclustered indexes are better for range queries than clustered indexes)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 25, 2011 at 9:33 am
dva2007 (10/25/2011)
It doesnt have any unique key and the uniqey key is 70 character long.
If it doesn't have a unique key, how are you able to determine it's length?
If your index doesn't return all of the fields in your query, then it still needs to look up the record in order to retrieve those fields. If I understand it correctly, the optimizer will use the clustered index--if it is available--for this lookup, which is why having one will speed up your queries.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 25, 2011 at 9:48 am
Clustered index = logical order. Not physical order. Must blog.
Gail, would you consider writing an article about this? It would generate lots of interest, I am sure.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 25, 2011 at 9:50 am
+1...please, it would be much appreciated.
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
October 25, 2011 at 9:50 am
Phil Parkin (10/25/2011)
Clustered index = logical order. Not physical order. Must blog.
Gail, would you consider writing an article about this? It would generate lots of interest, I am sure.
I'll write a blog post when I get back to blogging. It's trivial to prove and it should, in fact, be obvious that a clustered index doesn't guarantee or enforce physical order.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 25, 2011 at 10:18 am
GilaMonster (10/25/2011)
Phil Parkin (10/25/2011)
Clustered index = logical order. Not physical order. Must blog.
Gail, would you consider writing an article about this? It would generate lots of interest, I am sure.
I'll write a blog post when I get back to blogging. It's trivial to prove and it should, in fact, be obvious that a clustered index doesn't guarantee or enforce physical order.
Well, Mr Steve Jones, what do you think about that? :w00t:
Just kidding. Looking forward to that blog.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 25, 2011 at 1:51 pm
Yes and no, and I shouldn't write physical order. I realize what Gail is saying and it is true that the clustered index doesn't mean that the extents and pages are listed as page 1, then page 2, then page 3 on disk.
However there is physical ordering of the rows inside the pages, and in attempts to maintain the rows in the physical order in pages, which results in page splits. The extents are allocated as available, and they can be anywhere on disk.
October 25, 2011 at 2:59 pm
Steve Jones - SSC Editor (10/25/2011)
However there is physical ordering of the rows inside the pages, and in attempts to maintain the rows in the physical order in pages, which results in page splits.
SQL Server doesn't keep any PHYSICAL ordering inside the pages. What for? If any data from the page is needed the whole page will be read and then sorted using offset table.
And there is an easy way to prove it:
USE master
GO
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'ClusterIndexTest') DROP DATABASE [ClusterIndexTest]
GO
CREATE DATABASE [ClusterIndexTest]
GO
USE ClusterIndexTest
GO
CREATE TABLE dbo.TestTable (ID int primary key, field varchar(50))
insert dbo.TestTable values(1, REPLICATE('A', 50))
insert dbo.TestTable values(3, REPLICATE('B', 50))
insert dbo.TestTable values(2, REPLICATE('C', 50))
GO
DBCC TRACEON ( 3604 )
DBCC IND('ClusterIndexTest','dbo.TestTable', 1)
---- use PagePID from the previous result where PageType = 1
declare @dbid int = DB_ID()
DBCC PAGE (@dbid , 1 , PagePID , 1 )
GO
DROP DATABASE [ClusterIndexTest]
So I got this result:
Row - Offset
2 (0x2) - 161 (0xa1)
1 (0x1) - 226 (0xe2)
0 (0x0) - 96 (0x60)
Which clearly means that the Row with ID = 2 stored after ID = 3.
And the page splits are nothing to do with order of rows as well. Page split happens when there is no space on the page to fit new or updated row and order inside the page doesn't matter at all.
October 25, 2011 at 3:07 pm
I stand corrected then. It was always my understanding physical order was maintained inside pages.
October 25, 2011 at 3:27 pm
Steve Jones - SSC Editor (10/25/2011)
However there is physical ordering of the rows inside the pages and in attempts to maintain the rows in the physical order in pages, which results in page splits.
I think I know what you mean, but it's not quite what you said...
Let's say there's an index (clustered or not, it makes no difference) on an int column with 10 rows fitting onto a page and we inserted values 1-100.
Rows with key 1..10 will be on one page.
Rows with key 11..20 will be on another page
Rows with key 21..30 will be on a 3rd page
etc, etc, so on and so forth.
What order rows 1 to 10 are physically written onto the page is irrelevant and may not match the key order at all (in fact, if the table was insert-only, it'll be the insert order).
Which rows go onto which pages is critical for index navigation, so that the upper b-tree levels can be traversed to find a row, but the actual position of the row on the page is not important for locating a row, and so SQL doesn't bother wasting the processing time to put them into order.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 25, 2011 at 3:34 pm
drew.allen (10/25/2011)
dva2007 (10/25/2011)
If your index doesn't return all of the fields in your query, then it still needs to look up the record in order to retrieve those fields. If I understand it correctly, the optimizer will use the clustered index--if it is available--for this lookup, which is why having one will speed up your queries.Drew
Actually it's vice versa. Having a clustered key will slow down a lookup operation. Because if it's a heap then on the leaf level of any non-clustered indexes there is an RID which directly points to the data, so only one logical operation is needed to get to the data (assuming that there are no forwarding pointers). But if you have clustered key then at the bottom of non-cl index SQL engine will find just a value which will be used to perform Clustered Index Seek. And if you have, let's say, 4 levels in clustered B-Tree then 4 pages must be accessed to perform the seek, comparing to 1 page in case you don't have clustered index.
October 26, 2011 at 1:45 am
Hi Drew,
I've unique key but it is combination of different fields and looks like this:
200102_00001407BFFFDCD0F3DFCE2D2B8301ED_200102_1_1
It is unique but do you think we can make this clustered index? it will take edges to sort the data of 100 million in logical order based on this key.
I've just asked that assuming if there is no unique key then what will be the choice for clustered key?
In this case it seems there is lot of importance for clustered key and i understand the concept but it doesnt work in this case. Also autoid is the only choice for me to have a clustered index or no clustered index.
Thanks.
October 26, 2011 at 1:49 am
Steve, Gail,
I always assumed that clustered index put the data in the physical order in the table.
------------------------------------------------------------------------------------------------------------
Again according to this article here it says following:
http://www.sql-server-performance.com/2007/clustered-indexes/
Here are some more good reasons to add a clustered index to every table.
Keep in mind that a clustered index physically orders the data in a table based on a single or composite column. Second, the data in a heap (a table without a clustered index) is not stored in any particular physical order.
------------------------------------------------------------------------------------------------------------
I would look forward for the blog of gail to get proper understanding.
Thank you all for your comments.
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply