Uniqueidentifiers are a useful tool, but not one that is universally loved.
I'm going to start by covering some background, then talk about how they can be
used in both SQL 2000 & 2005, and finish with some thoughts about why
uniqueidentifiers are of special interest to developers doing data access.
Let's start with some terminology. What we SQL people call uniqueidentifiers
most people call Globally Unique Identifiers (GUID's). It's a sixteen byte
number generated using a combination of different data elements (commonly the
system time and a portion of the network card MAC address, but there are
variations) in a way that is supposed to guarantee it will be unique. You can
read a slightly more detailed overview on
Wikipedia or get into some
serious detail with this never quite finished
Internet Draft (which notes that the generation algorithm supports key
generation rates as high as ten million per second!).
Most you are familiar with identity (auto number) columns. If you're in a
situation where you need to easily generate a unique key they are very handy,
but can be troublesome in a couple situations. One is where as a developer you
find it useful to be able to generate a new key on the client without having to
make a round trip to the server. Another is when using replication, inserting
rows into an updatable subscriber require some extra work to manage identity key
ranges. Uniqueidentifiers easily address both of those issues, with the
corresponding tradeoff that they are harder to read (they look like this:
5da3c440-0a40-11db-9cd8-0800200c9a66) and they require sixteen bytes of storage,
where identities are usually stored as four byte integers.
In SQL 2000 & 2005, it's easy to generate a uniqueidentifier, just call the
NewID() function. For example, this would return a new id:
select newid()
Nothing complicated about that. It's also possible to use it as a default on
a column so that any row inserted will have a key generated automatically.
There is one additional complication on the database side. While the
uniqueidentifiers are guaranteed unique, they are not guaranteed to be
sequential. That means as you generate new uniqueidentifiers as primary keys and
insert them into a table there is a much higher chance you'll cause a page
split. SQL stores data on 8k pages, a page split occurs when there is not enough
room on the page for the new row, so the page is basically split into two pages
so that the insert can success. Page splits in general are not bad, it's the
frequency of page splits that can have an impact on performance as each split
requires additional locking and disk IO. In my view the worry about page splits
is overblown, but it's something you'll have to access in your environment to be
sure. Fast drives do much to alleviate this potential issue.
In SQL 2000 there was an additional option, but you had to look for it. If
you visit SQLDev.Net you can
download an extended stored procedure that will generate sequential (but still
unique) uniqueidentifiers.
SQL 2005 offers a new option, a functional called NewSequentialID() that is
guaranteed to generate a higher uniqueidentifier than previously generated on
that machine. This effectively neutralizes the page splitting concern, but at a
small cost - as noted in BOL because of the way they are calculated, it is
possible someone could guess the next key. That point is worth remembering if
you're exposing those ID's at all, for instance as a member id on a web site
where you might be passing the ID via a URL parameter. The tradeoff is that this
function ONLY works as a default on a column, you cannot call it directly.
There is another part of the performance story. Each non clustered index
contains the clustered key. That's an additional four bytes per row if you're
using an integer, but an additional sixteen bytes per row with a
uniqueidentifier. That means you need more disk space and will be able to fit
fewer rows on each index page, which equates to more page reads to use an index.
Developers often benefit from being able to create a new key without
making a round trip to a server. One scenario might be where you are letting
someone build up a set of order details on the client and them submit them all
at once. If you're using an identity column you have to insert the rows and then
return the keys, if you're using a uniqueidentifier you can generate them on the
client by calling the Windows CoCreateGuid API in VB6, or the
System.Guid.NewGuid method of the .Net framework.
It's easy to get caught up in the potential performance negatives of
uniqueidentifiers. Performance is an important part of the story to be sure, but
not the entire story. Typically the two groups that will benefit most from using
them are DBA's making heavy use of replication and developers that are trying to
avoid round trips (which also impacts performance!).
I hope all of this gets you thinking about the use of uniqueidentifiers in
general, and if page splitting was the reason for not using them before, maybe
it's time to revisit that decision.