Primary Key: GUID vs. INT

  • Hi Folk,

    is it a big performance loose when using GUIDs as Primary Keys?

    Greetz Teutales

  • Unless your requirements document requires that an actual GUID be used (for a justifiable reason) I would use an integer key of some sort. The size of the integer (int or bigint) will again depend upon the specification (how many rows will the table need to ultimately maintain) but obviously, the smaller the better for performance reasons.

    If a GUID must be used because of some specification further up the line it is best to implement the PK as a non-clustered index and put a clustered index on another data entity that would provide better storage organization (like a datetime column or perhaps an identity).

    Having said all that, any further comments or suggestions would depend upon details you have not provided. So these recommendations are general in nature.

    The probability of survival is inversely proportional to the angle of arrival.

  • Nothing wrong with a guid primary key. What causes performance problems is a Guid column used as the clustered index key. So if you're going to have Guid primary keys, put the clustered index elsewhere.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/26/2012)


    Nothing wrong with a guid primary key. What causes performance problems is a Guid column used as the clustered index key. So if you're going to have Guid primary keys, put the clustered index elsewhere.

    Does this depend on how the guids are generated, i.e. NEWSEQUENTIALID() vs NEWID() ?

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark-101232 (3/26/2012)


    GilaMonster (3/26/2012)


    Nothing wrong with a guid primary key. What causes performance problems is a Guid column used as the clustered index key. So if you're going to have Guid primary keys, put the clustered index elsewhere.

    Does this depend on how the guids are generated, i.e. NEWSEQUENTIALID() vs NEWID() ?

    Yes, to a degree. NewSequentialID is better, but it's not entirely sequential all the time.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/26/2012)


    Nothing wrong with a guid primary key. What causes performance problems is a Guid column used as the clustered index key. So if you're going to have Guid primary keys, put the clustered index elsewhere.

    I agreee Gail, but there is something else to take into consideration. If you have tons of foreign key references to this primary key, those will have to be GUIDs as well, causing those tables to grow. I've seen scenarios where the rows took up three times more space than they would if ints were used, and that does indeed have negative performance impact.



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • Nothing wrong with a guid primary key...

    Nothing wrong! But with a real life experience you learn, that it would be much better to stick with INTs or BIGINTs for many, many, many reasons...

    Wait for J Celko reply here :hehe:

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (3/27/2012)


    Nothing wrong with a guid primary key...

    Nothing wrong! But with a real life experience you learn, that it would be much better to stick with INTs or BIGINTs for many, many, many reasons...

    I used to work with a system that had Guid primary keys on all tables. Once I got the clustered indexes elsewhere it was fine. There are reasons for guids over identity columns. Distributed system and front end/middle tier allocated key values being just two of them.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/27/2012)


    Eugene Elutin (3/27/2012)


    Nothing wrong with a guid primary key...

    Nothing wrong! But with a real life experience you learn, that it would be much better to stick with INTs or BIGINTs for many, many, many reasons...

    I used to work with a system that had Guid primary keys on all tables. Once I got the clustered indexes elsewhere it was fine. There are reasons for guids over identity columns. Distributed system and front end/middle tier allocated key values being just two of them.

    I've seen solutions where client side PK allocation didn't need to use GUID's, INT PK doesn't need to be an identity...

    Hopefully, with sequences, finally introduced to SQLServer 2012, needs of using GUID's as PK, due to requirements of ID allocation in a client/middle tier, will go away.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (3/27/2012)


    Hopefully, with sequences, finally introduced to SQLServer 2012, needs of using GUID's as PK, due to requirements of ID allocation in a client/middle tier, will go away.

    That solves a different problem really. The point of GUIDs is that they're globally unique, so you can have multiple unconnected client/middle tiers that can safely generate an ID without worrying about collision with other clients. For example, we have a distributed web environment where I work that generates analytics data and all the session data hangs together with a GUID generated from the web tier and this data is then asyncronously streamed back to a SQL Server for reporting.

    This means that there doesn't need to be any shared storage on the web tier and we can easily use the session GUID as the PK for the session.

    There are other ways to solve the problem, but as storage/computing power increases, the width of the data gets less of a concern

  • Eugene Elutin (3/27/2012)


    GilaMonster (3/27/2012)


    Eugene Elutin (3/27/2012)


    Nothing wrong with a guid primary key...

    Nothing wrong! But with a real life experience you learn, that it would be much better to stick with INTs or BIGINTs for many, many, many reasons...

    I used to work with a system that had Guid primary keys on all tables. Once I got the clustered indexes elsewhere it was fine. There are reasons for guids over identity columns. Distributed system and front end/middle tier allocated key values being just two of them.

    I've seen solutions where client side PK allocation didn't need to use GUID's, INT PK doesn't need to be an identity...

    No they don't, but if doing client-side allocation of int values is horridly hard. Can't use a incrementing key without huge amounts of logic to check for/prevent race conditions, and random ints just get us back to the fragmentation problem that guids have.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ...

    No they don't, but if doing client-side allocation of int values is horridly hard. Can't use a incrementing key without huge amounts of logic to check for/prevent race conditions, and random ints just get us back to the fragmentation problem that guids have.

    There're ways and methods...

    And with creative approach, they are not so "horridly hard". Even, with using GUID's, if you want your client/middle layer to be super fast, you would want to cache pre-allocated range of keys. You can quite easily do it with incremental integers. Sophistication and elegance of solution will depend on dev's creativity 😉

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (3/27/2012)


    Even, with using GUID's, if you want your client/middle layer to be super fast, you would want to cache pre-allocated range of keys.

    Why would you want to cache ranges of keys when using Guids? Makes no sense at all. The main reason for using guids if allocating keys in the front/middle tier is that they can be generated without any round trips to the database, without any caching and they won't overlap with guids generated by other machines.

    Oh, and I've seen several 'creative approaches', I've cleaned up the mess they left too.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ...

    Why would you want to cache ranges of keys when using Guids? Makes no sense at all. The main reason for using guids if allocating keys in the front/middle tier is that they can be generated without any round trips to the database, without any caching and they won't overlap with guids generated by other machines.

    Oh, and I've seen several 'creative approaches', I've cleaned up the mess they left too.

    1. Because there are some cases where you don't want to spent any single CPU cycle for anything else than required calculation task (example: auto-trading systems, which creating the economy mess around the world :-))

    2. Yes, GUID's make it a bit "off-shelve" solution for not having to do round trips, but if you can do one single round-trip to reserve and cache the range of integer id's, you don't need to worry about round trips much.

    3. Critical system may not like the idea of possibility of generating duplicate GUID's (yeah, highly unlikely - but possible!)

    4. Oh, me too, I've also seen several 'creative mess approaches'. I've seen them both: with or without using GUID's.

    See, international outsourcing is not as bad as it seams. It makes us employed and justifies high rates paid! Someone need to do "dirty" clean up jobs...;-)

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (3/27/2012)


    ...

    Why would you want to cache ranges of keys when using Guids? Makes no sense at all. The main reason for using guids if allocating keys in the front/middle tier is that they can be generated without any round trips to the database, without any caching and they won't overlap with guids generated by other machines.

    Oh, and I've seen several 'creative approaches', I've cleaned up the mess they left too.

    1. Because there are some cases where you don't want to spent any single CPU cycle for anything else than required calculation task (example: auto-trading systems, which creating the economy mess around the world :-))

    You still never answered the question. You said earlier you would do a round trip for purposes of caching a bunch of GUIDs:

    Even, with using GUID's, if you want your client/middle layer to be super fast, you would want to cache pre-allocated range of keys.

    In what scenario would that be a good idea when there are plenty of options available for generating a GUID upstream, in an application tier? If one were serious about saving CPU cycles, one would not do a round trip to the database for purposes of generating a GUID.

    4. Oh, me too, I've also seen several 'creative mess approaches'. I've seen them both: with or without using GUID's.

    See, international outsourcing is not as bad as it seams. It makes us employed and justifies high rates paid! Someone need to do "dirty" clean up jobs... 😉

    What are you getting at? That seems like quite the blanket statement.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply