May 1, 2006 at 7:08 pm
I notice that most if not all of the tables in SQL Server 2005’s sample database, AdventureWorks, have both a surrogate primary key of integer type and a unique identifier column of GUID type. I know the GUID is probably there for replication but do we really need both? Could you not use the GUID column for both the PK and a unique identifier for replication and remove the integer based surrogate key? Is their an advantage for having both?
Thanks for your time!
May 3, 2006 at 6:07 am
isn't this kind of situation used when you have multiple locations using the identical database, and you need to merge the databases back together at some point? also, i thought foreign keys based on GUIDS were not recommended, that it should always be a int or bigint column that is referenced? so that is why there is an identity field, but the GUID is used when importing to determine if it is really the same record being inserted/updated when merging two or more databases?
Lowell
May 3, 2006 at 12:29 pm
OK, I got off my butt and did the research :p Here is a great article:
http://www.informit.com/articles/printerfriendly.asp?p=25862
The short of it:
GUIDs as PKs kill performance in large tables and large amounts of inserts and it is all because of the NEWID() function (even if you don’t cluster it, if you do cluster it, it can bring the server to its knees). This function hasn’t changed for SQL Server 2005 so everything in that article is valid. Too bad MS overlooked this problem, they could have fixed it for 2005. You can use a GUID as a PK but you have to roll your own as described in the article. I guess 2005 allows you to put that custom logic in C# and it should work even better. Would be a cool exercise…
[Lowell - that's replication and I covered that in my first post]
May 3, 2006 at 1:25 pm
The key is that GUIDS are sorted like all character data so if you have a million rows and you have covered 0-9A-F and you next value is 8xxxxxxxx(whatever) then you will have an insert somewhere in the middle of you data page (whether it be for the data itself because of clustered index or and index page). Get a few million more and the lieklyhood is none of your pages will be much more than 50% full. So you suffer lots of page splits. GUID is great for replication but only in a non-clustered index (your talking fewer potential affected rows) but not as a clustered index, on extrememly large tables thou I think I have decided to get rid of uniqueidentifiers altogther and just use a server numbering system so that each site can take entries and by making the Primary Key the identitiy column and the server id column I can merge them without any issue.
May 3, 2006 at 6:08 pm
What you say is true only if you are using NEWID(). If you roll your own GUIDs like the article shows. You have nearly the same performance as integers but all the advantages of GUIDs.
May 3, 2006 at 6:55 pm
Looks like MS has taken a stab at this:
http://msdn2.microsoft.com/en-us/library/ms189786(en-US,SQL.90).aspx
They have a new function: NEWSEQUENTIALID()
It can _only_ be used as the default value for a column in a table though. I have tried to use it but I get the error “Error validating default column” when I tried to use it has default in a new table. I can only create a table with it if I script the table. But then I get errors when I use the table:
CREATE TABLE myTable (ColumnA uniqueidentifier DEFAULT dbo.myfunction(NEWSEQUENTIALID()))
Anyone know what I am doing wrong?
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply