July 10, 2003 at 2:49 am
Hi,
All our tables have a GUID as the primary key. It's a decision from the past and for now we have to live with it, so bear with me.
In the Indexes window in the Sql Enterprise Manager [Sql 2000] we can set the index on the PK as Clustered or Non-clustered. I know what the difference is (ordered storage, vs. an ordered index of pointers into unordered storage).
I would expect that with Clustering, an 'insert' may take occasionally more time if it is time to make space for a record that does not fit where Sql Server needs to put it. Some of our tables have millions of records so this rearranging may take time (how much??), and cause locking and perhaps even deadlocks (which we experience every now and then, even on trivial Select statements). Therefore I'd say: make them all Non-clustered. Esp. because in all cases where a conceptual or logical record order would exist, it will definitely not be based on GUID values.
However our DBA insists on using Clustered indexes on all PKs, claiming that having only Non-clustered indexes on a table will cause Sql Server to create its own "hidden" Clustered index which is then beyond his control.
Is the DBA right that Clustered is the way to go? If we don't make a Clustered index, does Sql 2000 construct one automatically or does it not? Is there maybe a good article on this subject matter, esp. one that covers both GUIDs and (non-)Clustering?
Thanx,
Pjotr
July 10, 2003 at 3:42 am
The GUID part only matters because the inserts are going to be relatively random, where if you used an identity or other incrementing value all the inserts would go at the end. How often page splits occur largely depends on how long a row is and how much free space you have on your data pages. Rebuilding indexes nightly/weekly might well prevent any performance degradation you might see from the splits. I might be in the minority, but I worry about page splits when they affect performance. Since SQL seems to handle inserts very well, this has never been the case. Note - having good to excellent hardware helps.
As far as I know SQL doesn't make a hidden clustered index. If you don't have a clustered index you have a heap (to say the least!). I don't agree with making the primary key the clustered index by default. I look at each table to see if there is a column that will contain range type values that would make better use of the clustered index. Not having one can potentially slow down ALL your queries.
Lot's of discussion about such things here on the site, I have an old article up about it as well that has some great comments from readers attached:
Andy
July 10, 2003 at 5:08 am
You should not under any circumstance use GUIDs as a Clustered Index. The reason is the GUIDs will cause so much data shifting it will bog your server down over a short period in a high transaction environment and will suffer high lock contention.
As for a hidden clustered index that is not true. The table remains in a heap with no logic order. When the Non-Clustered index is built it will have the column data for each occurrance and will have thefile and page number with it to know where to look for the value.
Also, a clustered index does not mean the data is in sequential page order just with each page and the leaf indexes. In the leaf indexes there will be ranges to mak lookup faster but ultimately it is a matter of file and page number it jumps between leafs and to the page.
Now nother point why you would not want the GUID to be a clustered index is that when a clustered index is present all non-clustered indexes have the associated clustered index data value. In fact when a lookup goes accross a non-clustered index where a clustered index is present it will ten tranverse the clustered index to find the data pages needed. So in other word the guid will be attached to every non-clustered index row and again data changes will cause non-clustered indexes to be affected (this makes for very wide space consuming indexes).
Sometimes it is best to use a PK as a clustered index especially when it is a reference number you will always use for lookups. Otherwise try to get your most unique and slimest row which you query most often and make it the clustered index.
July 10, 2003 at 5:23 am
quote:
As for a hidden clustered index that is not true. The table remains in a heap with no logic order. When the Non-Clustered index is built it will have the column data for each occurrance and will have thefile and page number with it to know where to look for the value.
I think, I know what the dba meant.
If you create a nonclustered index and don't already have a clustered index, SQL Server has to create a 'phantom' clustered index anyway because nonclustered indexes always point to clustered index keys. So you might create your own clustered index , ensuring that it will be of some use.
Cheers,
Frank
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
July 10, 2003 at 5:57 am
Actually looking directly at the index data it does not create a clustered index. Does exactly what I stated.
First I created a table like so.
CREATE TABLE [tbl_AgentList] (
[CCID] [int] NOT NULL ,
[AgentID] [int] NOT NULL ,
[AgentName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[DteTme] [datetime] NOT NULL CONSTRAINT [DF_tbl_AgentList_DteTme] DEFAULT (getdate()),
CONSTRAINT [IX_tbl_AgentList] UNIQUE NONCLUSTERED
(
[CCID],
[AgentID]
) ON [PRIMARY]
) ON [PRIMARY]
GO
Then filled it will 632 rows.
Once done I ran
select * from sysindexes where [id] = object_id('tbl_AgentList') AND [name] = 'IX_tbl_AgentList'
and got the value from the column "first"
0x1F1A00000100
next I run
dbcc traceon(3604)
so I can actually get the data back from DBCC PAGE.
Then I transalated the value from first above to it's page value
Page File
1F1A0000 0100
Convert
1A1F0000 0001
or
6687 1
So page 6687 on filegroup 1
And thus I run
DBCC page(dbname,filegroup,page,option)
I prefer the layout of option 3
DBCC page('score',1,6687,3) --Score is my db name
and I see the following example data
PAGE: (1:6687)
--------------
BUFFER:
-------
BUF @0x00DA3180
---------------
bpage = 0x1B5EC000 bhash = 0x00000000 bpageno = (1:6687)
bdbid = 6 breferences = 3 bstat = 0xb
bspin = 0 bnext = 0x00000000
PAGE HEADER:
------------
Page @0x1B5EC000
----------------
m_pageId = (1:6687) m_headerVersion = 1 m_type = 2
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x4
m_objId = 1364915934 m_indexId = 5 m_prevPage = (0:0)
m_nextPage = (1:6717) pminlen = 17 m_slotCnt = 426
m_freeCnt = 2 m_freeData = 7338 m_reservedCnt = 0
m_lsn = (6245:121:108) m_xactReserved = 0 m_xdesId = (0:0)
m_ghostRecCnt = 0 m_tornBits = 0
Allocation Status
-----------------
GAM (1:2) = ALLOCATED SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED 0_PCT_FULL DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED
FileId PageId Row Level CCID AgentID ?
------ ----------- ------ ------ ----------- ----------- ------------------
1 6687 0 0 NULL NULL NULL
1 6687 1 0 2271 20126 0xD601000001000100
1 6687 2 0 2271 20127 0xD601000001000200
1 6687 3 0 2271 20128 0xD601000001000300
1 6687 4 0 2271 20130 0xD601000001000400--I want this record
1 6687 5 0 2271 20131 0xD601000001000500
1 6687 6 0 2271 20132 0xD601000001000600
1 6687 7 0 2271 20133 0xD601000001000700
1 6687 8 0 2271 20145 0xD601000001000800
1 6687 9 0 2271 20147 0xD601000001000900
1 6687 10 0 2271 20148 0xD601000001000A00
1 6687 11 0 2271 20149 0xD601000001000B00
1 6687 12 0 2271 20150 0xD601000001000C00
1 6687 13 0 2271 20151 0xD601000001000D00
1 6687 14 0 2271 20152 0xD601000001000E00
1 6687 15 0 2271 21500 0xD601000001000F00
1 6687 16 0 2271 21502 0xD601000001001000
(Note: 0xD601000001000400
D6010000 0100 0400
Page File RowInThisIndex)
So the record I want has CCID 2271 and AgentID 20130 and its value D60100000100 leads me to Page 470 and filegroup 1
and thus I run
DBCC page('score',1,470,3)
PAGE: (1:470)
-------------
BUFFER:
-------
BUF @0x00D9D100
---------------
bpage = 0x1B2E8000 bhash = 0x00000000 bpageno = (1:470)
bdbid = 6 breferences = 23 bstat = 0x49
bspin = 0 bnext = 0x00000000
PAGE HEADER:
------------
Page @0x1B2E8000
----------------
m_pageId = (1:470) m_headerVersion = 1 m_type = 1
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x0
m_objId = 1364915934 m_indexId = 0 m_prevPage = (0:0)
m_nextPage = (1:159) pminlen = 20 m_slotCnt = 117
m_freeCnt = 3055 m_freeData = 5029 m_reservedCnt = 0
m_lsn = (5355:386:2) m_xactReserved = 0 m_xdesId = (0:1591211)
m_ghostRecCnt = 0 m_tornBits = 805439525
Allocation Status
-----------------
GAM (1:2) = ALLOCATED SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x62 MIXED_EXT ALLOCATED 80_PCT_FULL DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED
...Data removed always starts at Slot 0...
Slot 4 Offset 0xfe3
-------------------
Record Type = PRIMARY_RECORD
Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
1B2E8FE3: 00140030 000008df 00004ea2 00a414fa 0........N......
1B2E8FF3: 000092cc 01000004 45002a00 72617764 .........*.Edwar
1B2E9003: 202c7364 6e616d41 6164 ds, Amanda
CCID = 2271
AgentID = 20130
AgentName = Edwards, Amanda
DteTme = Nov 22 2002 9:57AM
...Data removed in this case it ended at slot 116 so a total of 117 records on the page...
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
I did not cross any clustered indexes with this non-clustered index.
Hope this helps a bit on understanding how an index tranverses the data.
July 10, 2003 at 5:59 am
quote:
I think, I know what the dba meant.If you create a nonclustered index and don't already have a clustered index, SQL Server has to create a 'phantom' clustered index anyway because nonclustered indexes always point to clustered index keys. So you might create your own clustered index , ensuring that it will be of some use.
Just curious, where is this found? I have never come across this statement documented anywhere.
July 10, 2003 at 6:01 am
Hi James,
don't blame this on me. What I have written was taken word by word from a book about SQL Server Performance Optimization
Cheers,
Frank
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
July 10, 2003 at 6:06 am
quote:
Hi James,don't blame this on me. What I have written was taken word by word from a book about SQL Server Performance Optimization
Cheers,
Frank
Not blamming anyone. I just want to know for sure why this is said but in looking it is not that way. Maybe there is a point at which it occurrs that I just haven't seen. Do you know the name of the book or is it "SQL Server Performance Optimization"?
July 10, 2003 at 11:24 pm
Hi,
This is slightly off topic but why is using a GUID for a PK considered bad practice?
We use replication so GUID's as PK's (not clustered) are the way to go but I wouldn't have thought they had a particular downside. Is it because they a bigger that an identity column?
Cheers, Peter
July 10, 2003 at 11:50 pm
I think that having a GUID as a PK is an acceptable (didn't mention preferred) practice. Having it then clustered seems to make no sense whatsoever. I highly doubt that there are very many tables of GUID's that would have range queries ran against the GUIDs. (I'm sure there might be a few, but doubt that it's very often).
Like was mentioned before, use some other field for the clustered index. Possibly a field that is queried often and maybe even with range type queries.
David
July 11, 2003 at 4:36 am
Just wanted to flash around some interesting definitions from BOL:
quote:
Nonclustered indexes can be defined on a table with a clustered index, a heap, or an indexed view. In Microsoft® SQL Server™ 2000, the row locators in nonclustered index rows have two forms:If the table is a heap (does not have a clustered index), the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The entire pointer is known as a Row ID.
If the table does have a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2000 makes duplicate keys unique by adding an internally generated value. This value is not visible to users; it is used to make the key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.
I.e if the table is a Heap, the Pointer is NOT into any other index, it points directly into the Data Pages.
(With the unfortunate effect of the potential 'DualLookup' effect (when the sought data row has been moved and the RID in the NonClustered index is just pointing to another pointer...))
Regards, Hans!
P.S The edit was for my poor english skills 😉
Edited by - hanslindgren on 07/11/2003 04:40:11 AM
July 11, 2003 at 4:40 am
quote:
Nonclustered indexes can be defined on a table with a clustered index, a heap, or an indexed view. In Microsoft® SQL Server™ 2000, the row locators in nonclustered index rows have two forms:If the table is a heap (does not have a clustered index), the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The entire pointer is known as a Row ID.
If the table does have a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2000 makes duplicate keys unique by adding an internally generated value. This value is not visible to users; it is used to make the key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.
Thanks Hans. Knew I had read the details somewhere and could put my fingers on it. But As I posted I show exactly what is described in the above statement.
July 11, 2003 at 4:43 am
Antares:
I forgot to mention that your demonstration was exceptionally exhaustive and nice 🙂
Regards, Hans!
July 11, 2003 at 5:01 am
quote:
Antares:I forgot to mention that your demonstration was exceptionally exhaustive and nice 🙂
yes it was indeed
...and proofed what Brian Kelley mentioned in another thread NOT to rely only on one single source of information!
Cheers,
Frank
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
July 11, 2003 at 5:37 am
Thanks all for the great replies! I will try again to convince our DBA to revise some of his methods.
If I understand correctly, a datapage which becomes full does not mean that all data has to be moved, just the data from that page is split in two pages. So the overhead of a page split does not grow as a table grows.
However this may not apply to the time needed to rebuild the other (non-clustered) indexes. Or does it simply do a partial update of those too?
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply