An alternative to GUIDs

  • You're welcome 🙂

    "Simple" meant that it was easy to understand, people friendlier (for both company and customer) than some I've seen, and easy to implement (code and test).

    Per the "people friendly" point, I do not understand why some customer numbers are umteen chars long without any punctuation. They should be easy for the customer to recite and write, too.

    Enough ranting.

    "Happy" Jack

  • Great article, and great responses. I'm a developer on a team working on a redesign of a Visual FoxPro application. Initially the project scope will focus on migrating the VFP data to SQL Server along with redesigning the VFP app to use SQL as the backend. Then a new .NET app will be built to replace the VFP app.

    One of the topics that comes up frequently in our discussions is touched on by this article... namely:

    "How can we know what the parent/child/grandchild id's are so we can pre-populate PK/FK fields before we start a transaction?"

    The diagram in the article seems to address this, and to me also seems to throw transactions out the door. Is this normal? After persisting the parent, and retrieving the PK, what happens if/when child records cannot be persisted and we need to roll back to a state prior to the parent being persisted?

    Our existing VFP app has a "key generator" function that increments an integer value stored in an "id table" which is just a table with an integer field. Since the database is always local to the app (on a LAN), performance is fine.

    But, with our redesign, we will be migrating multiple geographically dispersed VFP databases into a single SQL Server instance, and now WAN and Internet latency will become a factor. I can see that repeated calls to either a SP to retrieve an ID value, or repeated calls to retrieve auto-incremented identity values, might become a performance bottleneck.

    Why are GUID's so bad? As long as they're not used as meaningful ID's (I.E. something people need to read and understand as part of the data) who cares how ugly they are? What realistic performance issues are there in SQL Server with GUID's?

    If someone can point me to resources documenting why GUID's are bad for SQL Server, it would help us.

    Again, great article, and thoughtful responses. Thank you.

  • If identity integer columns cannot be used because of the possibility of the same number being generated by two different servers, why not make the primary key a composite key that consists of the identity column + a tinyint that indicates which server it came from?

  • Right, I've seen it mentioned on here as well, but we use GUID COMBs as well (a GUID which has as it's basis a stamp in time). We have one FUNCTION in the database and all our "ID" columns use this function to derive their primary key in stead of "NEWID()" or "NEWSEQUENTIALID()". In the code, the same is done so we don't have to hit the database. A "helper" or utility class was created so we can just say "SomeObject.ID = DataUtility.NewSequentialID()". We couldn't do it with .NET Extension methods because extension methods rely on a instance already being created in order to pick up their intellisense (ex. this makes no sense to do: Guid.NewGuid().NewSequentialID()). So a static class/method was used.

    So, don't reinvent the wheel in this case, the GUID COMBs work great! I've also been able to insert millions of rows using that function and it was fast! Also, there is little to no fragmentation in the database because these ARE sequential IDs.

  • ComputerFieldsInc (9/15/2011)


    Great article, and great responses. I'm a developer on a team working on a redesign of a Visual FoxPro application. Initially the project scope will focus on migrating the VFP data to SQL Server along with redesigning the VFP app to use SQL as the backend. Then a new .NET app will be built to replace the VFP app.

    One of the topics that comes up frequently in our discussions is touched on by this article... namely:

    "How can we know what the parent/child/grandchild id's are so we can pre-populate PK/FK fields before we start a transaction?"

    The diagram in the article seems to address this, and to me also seems to throw transactions out the door. Is this normal? After persisting the parent, and retrieving the PK, what happens if/when child records cannot be persisted and we need to roll back to a state prior to the parent being persisted?

    Our existing VFP app has a "key generator" function that increments an integer value stored in an "id table" which is just a table with an integer field. Since the database is always local to the app (on a LAN), performance is fine.

    But, with our redesign, we will be migrating multiple geographically dispersed VFP databases into a single SQL Server instance, and now WAN and Internet latency will become a factor. I can see that repeated calls to either a SP to retrieve an ID value, or repeated calls to retrieve auto-incremented identity values, might become a performance bottleneck.

    Why are GUID's so bad? As long as they're not used as meaningful ID's (I.E. something people need to read and understand as part of the data) who cares how ugly they are? What realistic performance issues are there in SQL Server with GUID's?

    If someone can point me to resources documenting why GUID's are bad for SQL Server, it would help us.

    Again, great article, and thoughtful responses. Thank you.

    For why GUID's are bad for SQL Server you should start at Kimberly Tripp's blog. Here's a link, http://sqlskills.com/BLOGS/KIMBERLY/category/GUIDs.aspx. The top blog post on the link is the second so scroll down and read the bottom post first.

    It's no so much that GUID's are bad, but more about how you use them. They use more space than most applications really need which means fewer rows fit on a page which means more reads and IO is typically the biggest performance problem.

  • Utilising a separate table to store the highest existing identifier is not a good strategy for performance because it introduces a hot-spot into the insertion process: In order to insert a record, the single record containing the existing highest number must be locked for the duration of the insertion transaction.

    In contrast, the SCOPE_IDENTITY() function when used with an identity column does not suffer this limitation, because a new number is created by the identity column upon insert regardless of whether the insertion transaction succeeds or is rolled back. This means there is no contention imposed by the process that generates the next higher number, albeit that sequential records may have gaps in the numerical sequence because of a failure in a previous insertion. This is why the identity column is the preferred method for surrogate key generation.

    GUIDs also have their uses: They are particularly useful in distributed databases where unique identifiers must be created for records in multiple databases/servers. However the downside is that the GUID is a much larger datatype than an int, which means that the indexes created are much larger and therefore suffer from a reduced scan density.

    This is not such a problem if the GUID is used as the only index on a table or is a non-clustered index: However where the GUID forms part of the clustered index key, there is a consequential bloat on any non-clustered indexes created on the same table because the clustered index key is stored within every non-clustered index created on the clustered table. As a result, the scan density (and therefore performance) of the non-clustered indexes are therefore also reduced significantly.

    Therefore I would recommend that database designers avoid GUIDs where the table is large and if using a GUID, then the GUID should not form part of the clustered index.

  • To synchronize the two server clocks to 1 millisecond precision is not a trivial task.

  • Slightly off topic, but the author's point about developers loving GUIDs is spot on. After all, they are absolutely guaranteed to be unique keys. The problem, of course, is their size and essential randomness.

    I once needed the ability for a distributed knowledge base, sort of like a wiki with relational capabilities thrown in. Articles needed to be unique across the world, and would link to other (arbitrary) articles as an author desired.

    The twist is that each application maintained its own (fairly small - tens of thousands of articles) local database--but there were repositories of articles and those repositories needed to link articles with related articles, and then be able to export the whole mess on demand to import to the local author's database.

    But GUID performance is poor, so what I did was create a GUID to LUID (local ID) table. Basically the GUID was used only when importing/exporting articles, and internally (locally) everything was referenced by LUID (a normal identity field).

    It worked well. Performance was good, global uniqueness was maintained, giving me the best of both worlds. As long as repositories didn't exceed 2 billion articles everything was golden. 🙂

  • Actually, "roger.plowman" brings up an interesting idea. I've also used a design in the past with 2 tables (ExternalLinks, and ExternalValues). Basically, we had many different external systems we'd "link" data from into our system (external ordering systems, Outlook Appointment IDs, Outlook Task IDs, etc.). Basically, the "ExternalLinks" table stored information on the "type" of link (it described the link). The "ExternalValues" stored the actual "internal" vs. "external" value. So, translation between systems was very easy and we were even able to, in some cases, create Views which mimicked the external systems data.

    No, this probably won't solve any replication problems (unless you're *gasp* rolling out your own synchronization), but it's something to keep in your back pocket.

  • hxchi (9/15/2011)


    To synchronize the two server clocks to 1 millisecond precision is not a trivial task.

    Especially because Windows internal system clock has a default resolution of 3 milliseconds!

  • Pascal Declercq (9/15/2011)


    Quote: "What happens after 40 years (well actually nearer 60 years)? Well, it will be job security for my great-grandchildren so I'm not going to worry about it."

    I thought we IT guys had learned our lesson after the millenium bug ...

    I think I learnt two things.

    1. Make sure you will be pushing up the daisies when your solution breaks.

    2. There is a hell of a lot of money to be made in such occurencies:hehe:

  • iposner (9/15/2011)


    hxchi (9/15/2011)


    To synchronize the two server clocks to 1 millisecond precision is not a trivial task.

    Especially because Windows internal system clock has a default resolution of 3 milliseconds!

    I don't see the importance of this. If records are being generated on both servers, maybe you think that there could be collisions, but byte 14 would always be different between the servers, so it just could not happen.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • By looking at the other posts, I think that some are confusing Primary Keys, which should only be used for referential integrity and not used by humans, and Reference Keys, which are used by humans. Don't worry if your Primary Key is readable, it should not be used by humans in the first place.

  • Paxman (9/15/2011)


    By looking at the other posts, I think that some are confusing Primary Keys, which should only be used for referential integrity and not used by humans, and Reference Keys, which are used by humans. Don't worry if your Primary Key is readable, it should not be used by humans in the first place.

    And of course, choice of clustering can be independent as well.

  • There are 86400 seconds in a day, not 84400 that you stated in the article. You should probably change some numbers.

Viewing 15 posts - 16 through 30 (of 63 total)

You must be logged in to reply to this topic. Login to reply