Clustered indexes question

Question

Post reply

Clustered indexes question

PiMané

SSChampion

Points: 13965
More actions
September 7, 2009 at 10:39 am

#210444

Hi,
I've a table whose primary key is a GUID column... Is it worth having a clustered index over that primary key or, since guids are "random", will that only make the data change "move" a lot on disk? This table has a lot o writes and reads!
Thanks in advance,
Pedro

If you need to work better, try working less...

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Marcin Gol Hall of Fame Points: 3314 More actions · Answer 1

if you need you may stay play with this index - but you should set a proper fillfactor and maybe padindex;

if you dont need clustered index there - you should use it as a nonclustered index

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 2

Hi Pedro

Why do you need a GUID as primary key? SQL Server works much better with INT primary keys.

If you really need a GUID primary key, you should avoid to define it as clustered index. The clustered index defines the structure of table and is used by other indexes. Since a GUID is not sequential it causes many page splits. As Marcin wrote, you can use a less FILLFACTOR which reduces the problem but doesn't eliminate it. A better solution would be another IDENTITY column which represents the clustered index.

SQL Server 2005 introduced the NEWSEQUENTIALID() function as default value for your UNIQUEIDENTIFIER column. The NEWSEQUENTIALID() method creates an sequential, valid GUID which works much better as clustered index. Anyway, it's still less performing than a INT column and (afaik) there is no function like SCOPE_IDENTITY() to get the last generated id. You have to use the OUTPUT clause to get the last generated ids into a temporary table or any other destination.

Greets

Flo

PiMané SSChampion Points: 13965 More actions · Answer 3

Hi,

We use GUIDs so that records have diferent ids on all databases we have.

Some records can by synchronized across servers (like replication) and we needed an unique key.

Yeah, I know we could have used a combined key (Id - bigint; ServerId - bigint) but the project was already in development when I can here and know GUID's are a cross I have to carry and deal with :S.

Is it a good idea not to have a CLUSTERED INDEX, since the PK uses GUIDs, or should I create one using an Id column, BIGINT - IDENTITY(1,1), (used only for the clustered index)?

Thanks,

Pedro

If you need to work better, try working less...

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 4

PiMané (9/8/2009)
Is it a good idea not to have a CLUSTERED INDEX, since the PK uses GUIDs, or should I create one using an Id column, BIGINT - IDENTITY(1,1), (used only for the clustered index)?

There are a few reasons for tables without a clustered index (e.g. some special kinds of staging tables) but in most cases a table should have a clustered index. If your table has no clustered index other indexes will run into problems when your data become changed.

My suggestion is to use a (BIG)INT IDENTITY column as clustered index.

PiMané SSChampion Points: 13965 More actions · Answer 5

Thanks,

Since I'm creating this column now, when the table is created and already has data, is there a problem for this column to be the last one on the list, or should I create a temp table, move the data into it, drop the old table and rename the new table (like Management Studio does when you insert a column in the middle of others)?

I've read somewhere that "fixed" size types (int, char, bigint, ...) should always be the first columns and nvarchar, image, ... should be the last ones. Is this true or SQL Server manages this problem (nvarchar growing causing page "breaks" and data having to move to another page)?

Thanks once again,

Pedro

If you need to work better, try working less...

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 6

I don't think that this is a problem since the index tree is maintained without the data structure and points to the row, not to the column.

Probably here are storage engine gurus with a better knowledge than me for this question.

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 7

All the testing I've done with GUID's as a clustered index have shown some serious performance problems. The one thing that alleviates this somewhat is to use the SEQUENTIALID mechanism. There are valid reasons for using these things as PK's, but it really is a bit of a balancing act to get them to work well.

Keep an eye on fragmentation. It's likely to be a bit higher. You might want to look into using index compression to reduce the number of pages you use. I think that'll help (of course, I think that helps in general).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 8

Oh, and yes, you should have a clustered index on most of your tables (most being 99.9999999999%) or have a very good reason why not, with testing and good documentation. SQL Server is built and optimized around storing data in a clustered index. It's just a good practice to follow. Yes, there are exceptions, but the exceptions should be exceptional (I love saying that).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

PiMané SSChampion Points: 13965 More actions · Answer 9

Should I have the clustered index even if it's on a "ghost" column, like in my case the column won't be used for anything but the clustered index?

Right now the table looks like this:

Id - GUID

ClientId - GUID

Data - XML

The primary key is on Id and ClientId, beeing a clustered index.

Should I just create the column RowId - BIGINT IDENTITY(1,1) and create a clustered index on that column and create a non clustered index on Id and ClientId to keep the "connections" to the other tables?

PS: I've got a vertical database, each table has 3 other tables associated to store strings, numbers, datetime values, "connected" by the Id and ClientId.

Thanks,

Pedro

If you need to work better, try working less...

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 10

PiMané (9/8/2009)
Should I have the clustered index even if it's on a "ghost" column, like in my case the column won't be used for anything but the clustered index?
Right now the table looks like this:
Id - GUID
ClientId - GUID
Data - XML
The primary key is on Id and ClientId, beeing a clustered index.
Should I just create the column RowId - BIGINT IDENTITY(1,1) and create a clustered index on that column and create a non clustered index on Id and ClientId to keep the "connections" to the other tables?
PS: I've got a vertical database, each table has 3 other tables associated to store strings, numbers, datetime values, "connected" by the Id and ClientId.
Thanks,
Pedro

No, I wouldn't go that far. You'll end up with lots of key lookups which can be very bad. There has to be a common access path, a column or set of columns that are the most frequently used for pulling the data. That's generally where the cluster should go. Best for performance is a monotonically increasing integer value, an identity, but it's not the only option. It's just that GUID's are a bit large and you're better off keeping the cluster, or any other index, as small as you can. But note, that's as small as you can, not as small as possible. If the GUID is the single point of entry into the table, or the most common point of entry into the table, then it may be the best bet for your cluster. You just need to be aware that it comes with baggage.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

PiMané SSChampion Points: 13965 More actions · Answer 11

Unfortunately the Guid's are the only link between the data tables.

The big problem I've got with the Guid is that it's the application that sends it and I can't use NEWSEQUENTIALID(), and that will generate a lot of page breaks...

But since the "link" between the tables are the 2 Guid columns I don't have other choice than use them as a Clustered Index... I could tell the programmers to change the program but they probably wouldn't like it :D...

Thanks again,

Pedro

If you need to work better, try working less...

PiMané SSChampion Points: 13965 More actions · Answer 12

The data, since we have lots of customers, will be sorted by the ClientId and the Id (record Id), both GUID's (and I can't get away from this, I wish but the application is made from to to bottom using this structure). The isn't much knowledge on how to sort the records since the users can sort the data by any column (click on top of the grid on the column name)... but since this is the "link" between the tables, it seems to be the best option.

Pedro

If you need to work better, try working less...

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 13

stewartc (9/9/2009)
My recommendation then:
clustered index on ClientID, non clustered index in the GUID.

ClientID is a GUID.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

PiMané SSChampion Points: 13965 More actions · Answer 14

Yep. That's right. Booth clientId and Id are Guids.

And since they are the primary key might as well be clustered...

Pedro

If you need to work better, try working less...