If PK is a GUID, what index type is best?

Question

If PK is a GUID, what index type is best?

pjotrb

SSC Rookie

Points: 34
More actions
July 10, 2003 at 2:49 am

#106186

Hi,
All our tables have a GUID as the primary key. It's a decision from the past and for now we have to live with it, so bear with me.
In the Indexes window in the Sql Enterprise Manager [Sql 2000] we can set the index on the PK as Clustered or Non-clustered. I know what the difference is (ordered storage, vs. an ordered index of pointers into unordered storage).
I would expect that with Clustering, an 'insert' may take occasionally more time if it is time to make space for a record that does not fit where Sql Server needs to put it. Some of our tables have millions of records so this rearranging may take time (how much??), and cause locking and perhaps even deadlocks (which we experience every now and then, even on trivial Select statements). Therefore I'd say: make them all Non-clustered. Esp. because in all cases where a conceptual or logical record order would exist, it will definitely not be based on GUID values.
However our DBA insists on using Clustered indexes on all PKs, claiming that having only Non-clustered indexes on a table will cause Sql Server to create its own "hidden" Clustered index which is then beyond his control.
Is the DBA right that Clustered is the way to go? If we don't make a Clustered index, does Sql 2000 construct one automatically or does it not? Is there maybe a good article on this subject matter, esp. one that covers both GUIDs and (non-)Clustering?
Thanx,
Pjotr

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Andy Warren SSC Guru Points: 119902 More actions · Answer 1

The GUID part only matters because the inserts are going to be relatively random, where if you used an identity or other incrementing value all the inserts would go at the end. How often page splits occur largely depends on how long a row is and how much free space you have on your data pages. Rebuilding indexes nightly/weekly might well prevent any performance degradation you might see from the splits. I might be in the minority, but I worry about page splits when they affect performance. Since SQL seems to handle inserts very well, this has never been the case. Note - having good to excellent hardware helps.

As far as I know SQL doesn't make a hidden clustered index. If you don't have a clustered index you have a heap (to say the least!). I don't agree with making the primary key the clustered index by default. I look at each table to see if there is a column that will contain range type values that would make better use of the clustered index. Not having one can potentially slow down ALL your queries.

Lot's of discussion about such things here on the site, I have an old article up about it as well that has some great comments from readers attached:

http://www.sqlservercentral.com/columnists/awarren/worstpracticesnotusingprimarykeysandclusteredindex.asp

Andy

http://www.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Antares686 SSC Guru Points: 125444 More actions · Answer 2

You should not under any circumstance use GUIDs as a Clustered Index. The reason is the GUIDs will cause so much data shifting it will bog your server down over a short period in a high transaction environment and will suffer high lock contention.

As for a hidden clustered index that is not true. The table remains in a heap with no logic order. When the Non-Clustered index is built it will have the column data for each occurrance and will have thefile and page number with it to know where to look for the value.

Also, a clustered index does not mean the data is in sequential page order just with each page and the leaf indexes. In the leaf indexes there will be ranges to mak lookup faster but ultimately it is a matter of file and page number it jumps between leafs and to the page.

Now nother point why you would not want the GUID to be a clustered index is that when a clustered index is present all non-clustered indexes have the associated clustered index data value. In fact when a lookup goes accross a non-clustered index where a clustered index is present it will ten tranverse the clustered index to find the data pages needed. So in other word the guid will be attached to every non-clustered index row and again data changes will cause non-clustered indexes to be affected (this makes for very wide space consuming indexes).

Sometimes it is best to use a PK as a clustered index especially when it is a reference number you will always use for lookups. Otherwise try to get your most unique and slimest row which you query most often and make it the clustered index.

Frank Kalis SSC Guru Points: 111183 More actions · Answer 3

quote:
As for a hidden clustered index that is not true. The table remains in a heap with no logic order. When the Non-Clustered index is built it will have the column data for each occurrance and will have thefile and page number with it to know where to look for the value.

I think, I know what the dba meant.

If you create a nonclustered index and don't already have a clustered index, SQL Server has to create a 'phantom' clustered index anyway because nonclustered indexes always point to clustered index keys. So you might create your own clustered index , ensuring that it will be of some use.

Cheers,

Frank

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Antares686 SSC Guru Points: 125444 More actions · Answer 4

Actually looking directly at the index data it does not create a clustered index. Does exactly what I stated.

First I created a table like so.

CREATE TABLE [tbl_AgentList] (

[CCID] [int] NOT NULL ,

[AgentID] [int] NOT NULL ,

[AgentName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

[DteTme] [datetime] NOT NULL CONSTRAINT [DF_tbl_AgentList_DteTme] DEFAULT (getdate()),

CONSTRAINT [IX_tbl_AgentList] UNIQUE NONCLUSTERED

(

[CCID],

[AgentID]

) ON [PRIMARY]

GO

Then filled it will 632 rows.

Once done I ran

select * from sysindexes where [id] = object_id('tbl_AgentList') AND [name] = 'IX_tbl_AgentList'

and got the value from the column "first"

0x1F1A00000100

next I run

dbcc traceon(3604)

so I can actually get the data back from DBCC PAGE.

Then I transalated the value from first above to it's page value

Page File

1F1A0000 0100

Convert

1A1F0000 0001

or

6687 1

So page 6687 on filegroup 1

And thus I run

DBCC page(dbname,filegroup,page,option)

I prefer the layout of option 3

DBCC page('score',1,6687,3) --Score is my db name

and I see the following example data



PAGE: (1:6687)

--------------



BUFFER:

-------



BUF @0x00DA3180

---------------

bpage = 0x1B5EC000        bhash = 0x00000000        bpageno = (1:6687)

bdbid = 6                 breferences = 3           bstat = 0xb

bspin = 0                 bnext = 0x00000000        



PAGE HEADER:

------------



Page @0x1B5EC000

----------------

m_pageId = (1:6687)       m_headerVersion = 1       m_type = 2

m_typeFlagBits = 0x0      m_level = 0               m_flagBits = 0x4

m_objId = 1364915934      m_indexId = 5             m_prevPage = (0:0)

m_nextPage = (1:6717)     pminlen = 17              m_slotCnt = 426

m_freeCnt = 2             m_freeData = 7338         m_reservedCnt = 0

m_lsn = (6245:121:108)    m_xactReserved = 0        m_xdesId = (0:0)

m_ghostRecCnt = 0         m_tornBits = 0            



Allocation Status

-----------------

GAM (1:2) = ALLOCATED     SGAM (1:3) = ALLOCATED    

PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL   DIFF (1:6) = CHANGED

ML (1:7) = NOT MIN_LOGGED 

FileId PageId      Row    Level  CCID        AgentID     ?                  

------ ----------- ------ ------ ----------- ----------- ------------------ 

1      6687        0      0      NULL        NULL        NULL

1      6687        1      0      2271        20126       0xD601000001000100

1      6687        2      0      2271        20127       0xD601000001000200

1      6687        3      0      2271        20128       0xD601000001000300

1      6687        4      0      2271        20130       0xD601000001000400--I want this record

1      6687        5      0      2271        20131       0xD601000001000500

1      6687        6      0      2271        20132       0xD601000001000600

1      6687        7      0      2271        20133       0xD601000001000700

1      6687        8      0      2271        20145       0xD601000001000800

1      6687        9      0      2271        20147       0xD601000001000900

1      6687        10     0      2271        20148       0xD601000001000A00

1      6687        11     0      2271        20149       0xD601000001000B00

1      6687        12     0      2271        20150       0xD601000001000C00

1      6687        13     0      2271        20151       0xD601000001000D00

1      6687        14     0      2271        20152       0xD601000001000E00

1      6687        15     0      2271        21500       0xD601000001000F00

1      6687        16     0      2271        21502       0xD601000001001000

(Note: 0xD601000001000400

D6010000 0100 0400

Page File RowInThisIndex)

So the record I want has CCID 2271 and AgentID 20130 and its value D60100000100 leads me to Page 470 and filegroup 1

and thus I run

DBCC page('score',1,470,3)





PAGE: (1:470)

-------------



BUFFER:

-------



BUF @0x00D9D100

---------------

bpage = 0x1B2E8000        bhash = 0x00000000        bpageno = (1:470)

bdbid = 6                 breferences = 23          bstat = 0x49

bspin = 0                 bnext = 0x00000000        



PAGE HEADER:

------------



Page @0x1B2E8000

----------------

m_pageId = (1:470)        m_headerVersion = 1       m_type = 1

m_typeFlagBits = 0x0      m_level = 0               m_flagBits = 0x0

m_objId = 1364915934      m_indexId = 0             m_prevPage = (0:0)

m_nextPage = (1:159)      pminlen = 20              m_slotCnt = 117

m_freeCnt = 3055          m_freeData = 5029         m_reservedCnt = 0

m_lsn = (5355:386:2)      m_xactReserved = 0        m_xdesId = (0:1591211)

m_ghostRecCnt = 0         m_tornBits = 805439525    



Allocation Status

-----------------

GAM (1:2) = ALLOCATED     SGAM (1:3) = ALLOCATED    

PFS (1:1) = 0x62 MIXED_EXT ALLOCATED  80_PCT_FULL   DIFF (1:6) = CHANGED

ML (1:7) = NOT MIN_LOGGED 



...Data removed always starts at Slot 0...



Slot 4 Offset 0xfe3

-------------------

Record Type = PRIMARY_RECORD                        

Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS   

1B2E8FE3:  00140030  000008df  00004ea2  00a414fa 0........N......

1B2E8FF3:  000092cc  01000004  45002a00  72617764 .........*.Edwar

1B2E9003:  202c7364  6e616d41      6164           ds, Amanda

CCID                             = 2271             

AgentID                          = 20130            

AgentName                        = Edwards, Amanda  

DteTme                           = Nov 22 2002  9:57AM



...Data removed in this case it ended at slot 116 so a total of 117 records on the page...



DBCC execution completed. If DBCC printed error messages, contact your system administrator.

I did not cross any clustered indexes with this non-clustered index.

Hope this helps a bit on understanding how an index tranverses the data.

Antares686 SSC Guru Points: 125444 More actions · Answer 5

quote:
I think, I know what the dba meant.
If you create a nonclustered index and don't already have a clustered index, SQL Server has to create a 'phantom' clustered index anyway because nonclustered indexes always point to clustered index keys. So you might create your own clustered index , ensuring that it will be of some use.

Just curious, where is this found? I have never come across this statement documented anywhere.

Frank Kalis SSC Guru Points: 111183 More actions · Answer 6

Hi James,

don't blame this on me. What I have written was taken word by word from a book about SQL Server Performance Optimization

Cheers,

Frank

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Antares686 SSC Guru Points: 125444 More actions · Answer 7

quote:
Hi James,
don't blame this on me. What I have written was taken word by word from a book about SQL Server Performance Optimization
Cheers,
Frank

Not blamming anyone. I just want to know for sure why this is said but in looking it is not that way. Maybe there is a point at which it occurrs that I just haven't seen. Do you know the name of the book or is it "SQL Server Performance Optimization"?

jonespm Ten Centuries Points: 1393 More actions · Answer 8

Hi,

This is slightly off topic but why is using a GUID for a PK considered bad practice?

We use replication so GUID's as PK's (not clustered) are the way to go but I wouldn't have thought they had a particular downside. Is it because they a bigger that an identity column?

Cheers, Peter

sumlin Old Hand Points: 353 More actions · Answer 9

I think that having a GUID as a PK is an acceptable (didn't mention preferred) practice. Having it then clustered seems to make no sense whatsoever. I highly doubt that there are very many tables of GUID's that would have range queries ran against the GUIDs. (I'm sure there might be a few, but doubt that it's very often).

Like was mentioned before, use some other field for the clustered index. Possibly a field that is queried often and maybe even with range type queries.

David

Hans Lindgren SSChampion Points: 10496 More actions · Answer 10

Just wanted to flash around some interesting definitions from BOL:

quote:
Nonclustered indexes can be defined on a table with a clustered index, a heap, or an indexed view. In Microsoft® SQL Server™ 2000, the row locators in nonclustered index rows have two forms:
If the table is a heap (does not have a clustered index), the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The entire pointer is known as a Row ID.
If the table does have a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2000 makes duplicate keys unique by adding an internally generated value. This value is not visible to users; it is used to make the key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.

I.e if the table is a Heap, the Pointer is NOT into any other index, it points directly into the Data Pages.

(With the unfortunate effect of the potential 'DualLookup' effect (when the sought data row has been moved and the RID in the NonClustered index is just pointing to another pointer...))

Regards, Hans!

P.S The edit was for my poor english skills 😉

Edited by - hanslindgren on 07/11/2003 04:40:11 AM

Antares686 SSC Guru Points: 125444 More actions · Answer 11

quote:
Nonclustered indexes can be defined on a table with a clustered index, a heap, or an indexed view. In Microsoft® SQL Server™ 2000, the row locators in nonclustered index rows have two forms:
If the table is a heap (does not have a clustered index), the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The entire pointer is known as a Row ID.
If the table does have a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2000 makes duplicate keys unique by adding an internally generated value. This value is not visible to users; it is used to make the key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.

Thanks Hans. Knew I had read the details somewhere and could put my fingers on it. But As I posted I show exactly what is described in the above statement.

Hans Lindgren SSChampion Points: 10496 More actions · Answer 12

Antares:

I forgot to mention that your demonstration was exceptionally exhaustive and nice 🙂

Regards, Hans!

Frank Kalis SSC Guru Points: 111183 More actions · Answer 13

quote:
Antares:
I forgot to mention that your demonstration was exceptionally exhaustive and nice 🙂

yes it was indeed

...and proofed what Brian Kelley mentioned in another thread NOT to rely only on one single source of information!

Cheers,

Frank

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

pjotrb SSC Rookie Points: 34 More actions · Answer 14

Thanks all for the great replies! I will try again to convince our DBA to revise some of his methods.

If I understand correctly, a datapage which becomes full does not mean that all data has to be moved, just the data from that page is split in two pages. So the overhead of a page split does not grow as a table grows.

However this may not apply to the time needed to rebuild the other (non-clustered) indexes. Or does it simply do a partial update of those too?