GUIDs as clustered index

  • Hi All,

    Looking for some suggestions.

    Currently, In our system we have GUIDs are primary keys and as the database is keep growing, seeing serious index fragmentation issues. Thats the why initial development happened using .net Entity Framework model and generating a GUID in front end and storing in the database. The other part is, there are lot of child tables which are
    referencing these GUIDs. Now the business wanted to get rid of the GUIDs based primary key and clustered index.
    So, they came up with a workaround as below.

    - Introduced a new column as BIGINT column.
    - Made the bigint column as NOT NULL IDENTITY(1,1) with clustered index on it
    - Retained the Primary Key constraint on the GUID column.

    Note: the searching is still based based on GUIDs.

    Now, I want to know if there is any problem with such approach? Does it eliminate the index fragmentation? Any performance issues in this approach.
    From my point of view, I see still see the record size,tablesize,backup size etc... overhead of maintaining those GUIDs.
    How can we completely get rid of these GUIDs.

    Thanks,

    Sam

  • If your worry is just the index fragmentation you probably can avoid it if you change your application to generate a sequential guid instead of a "normal" guid.

    https://blogs.msdn.microsoft.com/dbrowne/2012/07/03/how-to-generate-sequential-guids-for-sql-server-in-net/

    your option of keeping the guid as a non clustered index and if it is still the main column to use for querying/joining will most likely just make your queries go slower as they will need to access both the guid index and the clustered index to get the data out of it.

  • vsamantha35 - Wednesday, May 30, 2018 5:43 AM

    Hi All,

    Looking for some suggestions.

    Currently, In our system we have GUIDs are primary keys and as the database is keep growing, seeing serious index fragmentation issues. Thats the why initial development happened using .net Entity Framework model and generating a GUID in front end and storing in the database. The other part is, there are lot of child tables which are
    referencing these GUIDs. Now the business wanted to get rid of the GUIDs based primary key and clustered index.
    So, they came up with a workaround as below.

    - Introduced a new column as BIGINT column.
    - Made the bigint column as NOT NULL IDENTITY(1,1) with clustered index on it
    - Retained the Primary Key constraint on the GUID column.

    Note: the searching is still based based on GUIDs.

    Now, I want to know if there is any problem with such approach? Does it eliminate the index fragmentation? Any performance issues in this approach.
    From my point of view, I see still see the record size,tablesize,backup size etc... overhead of maintaining those GUIDs.
    How can we completely get rid of these GUIDs.

    Thanks,

    Sam

    I agree with federico...  in the long run, it's just a different index that gets fragmented.   If you count the number of DISTINCT guid values, and that number's forseeable count some 5 to 10 years down the road is still within the range of BIGINT, then it might make the most sense to do the hard work now and establish a temporary table that maps the GUID values to BIGINT values, and then use that table to support the re-mapping exercise for all the child tables.   Of course, your application will have to stop generating GUID values and instead generate BIGINT values.   A sequence object can help with that.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Best would be to use a bigint SEQUENCE object and get rid of the guids completely.  The app can still pre-set the values as it did for guids, which was probably the reason for using guids in the first place, so they could be assigned ahead of time by the app and not at table insert time.  But you can do that using a SEQUENCE too, without all the overhead and hassles of guids.

    Next best -- a much-less-good option -- is to use a bigint clus index on the main table, but keep the guid in the table, and also an index on the table on only ( guid, bigint_value ) to allow SQL to convert the guid(s) to bigint(s).  For only ~33 byte (24 plus overhead bytes) rows, the fragmentation won't be that.  And the rows are small enough you could lower the fillfactor to 80 without causing too much disk loss, and that would drastically reduce page splits anyway.  Just periodically rebuild the index as the pages fill in.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • vsamantha35 - Wednesday, May 30, 2018 5:43 AM

    How can we completely get rid of these GUIDs.

    Thanks,

    Sam

    Change the application

  • vsamantha35 - Wednesday, May 30, 2018 5:43 AM

    Hi All,

    Looking for some suggestions.

    Currently, In our system we have GUIDs are primary keys and as the database is keep growing, seeing serious index fragmentation issues. Thats the why initial development happened using .net Entity Framework model and generating a GUID in front end and storing in the database. The other part is, there are lot of child tables which are
    referencing these GUIDs. Now the business wanted to get rid of the GUIDs based primary key and clustered index.
    So, they came up with a workaround as below.

    - Introduced a new column as BIGINT column.
    - Made the bigint column as NOT NULL IDENTITY(1,1) with clustered index on it
    - Retained the Primary Key constraint on the GUID column.

    Note: the searching is still based based on GUIDs.

    Now, I want to know if there is any problem with such approach? Does it eliminate the index fragmentation? Any performance issues in this approach.
    From my point of view, I see still see the record size,tablesize,backup size etc... overhead of maintaining those GUIDs.
    How can we completely get rid of these GUIDs.

    Thanks,

    Sam

    There are a lot of suggestions in this discussion.  One of the more important aspects is that changing to use and IDENTITY column may do absolutely NOTHING to help fragmentation.  It doesn't matter what the PK is, what the key of any index is, nor whether or not such key are "ever-increasing" or not if you INSERT into the table and then later update ANY variable width column from NULL or Empty String to something else or you update a variable width column with larger data.

    I need to know if that's the case with your table and, do yourself a favor, don't make any changes to what you have until we hammer this out because, again, such changes may be useless when it comes to page splits, fragmentation, and performance of your system.  Even an "ever increasing" Clustered Index will add to your problems.  We can, in fact, make it so that there will be no page splits of a GUID based index (Clustered or not) for up to 8 weeks at a time (depending on the width of the index) with just a couple of incredibly simple changes to your index maintenance settings.

    I can also guarantee you that if you're doing nothing for fragmentation until you reach 10% fragmentation and then doing REORGANIZE for 10% to 30% and then doing a REBUILD if you happen to trip across the 30% mark whether or not a table/index is an Insert/Update table or not, that you're actually perpetuating page splits, perpetuating fragmentation, perpetuating performance and blocking problems that you don't even know you have and, in a highly ironic "Catch 22", you're perpetuating the need for Index Maintenance whether its a GUID index or not.

    My recommendation is to post the CREATE TABLE code for the table in question along with ALL indexes/constraints so I can have a look at it.  I also need to know if the table is suffering from the INSERT/expAnsive UPDATE problem that I told you about above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • frederico_fonseca - Wednesday, May 30, 2018 7:31 AM

    If your worry is just the index fragmentation you probably can avoid it if you change your application to generate a sequential guid instead of a "normal" guid.

    https://blogs.msdn.microsoft.com/dbrowne/2012/07/03/how-to-generate-sequential-guids-for-sql-server-in-net/

    your option of keeping the guid as a non clustered index and if it is still the main column to use for querying/joining will most likely just make your queries go slower as they will need to access both the guid index and the clustered index to get the data out of it.

    " if it is still the main column to use for querying/joining will most likely just make your queries go slower as they will need to access both the guid index and the clustered index to get the data out of it." ----  Can u elaborate on this pl ?? can u give an example on how it is going hit/visit both the indices on how it will be a slow operation??  want  to visualize  how expensive it is.

  • sgmunson - Wednesday, May 30, 2018 10:48 AM

    vsamantha35 - Wednesday, May 30, 2018 5:43 AM

    Hi All,

    Looking for some suggestions.

    Currently, In our system we have GUIDs are primary keys and as the database is keep growing, seeing serious index fragmentation issues. Thats the why initial development happened using .net Entity Framework model and generating a GUID in front end and storing in the database. The other part is, there are lot of child tables which are
    referencing these GUIDs. Now the business wanted to get rid of the GUIDs based primary key and clustered index.
    So, they came up with a workaround as below.

    - Introduced a new column as BIGINT column.
    - Made the bigint column as NOT NULL IDENTITY(1,1) with clustered index on it
    - Retained the Primary Key constraint on the GUID column.

    Note: the searching is still based based on GUIDs.

    Now, I want to know if there is any problem with such approach? Does it eliminate the index fragmentation? Any performance issues in this approach.
    From my point of view, I see still see the record size,tablesize,backup size etc... overhead of maintaining those GUIDs.
    How can we completely get rid of these GUIDs.

    Thanks,

    Sam

    I agree with federico...  in the long run, it's just a different index that gets fragmented.   If you count the number of DISTINCT guid values, and that number's forseeable count some 5 to 10 years down the road is still within the range of BIGINT, then it might make the most sense to do the hard work now and establish a temporary table that maps the GUID values to BIGINT values, and then use that table to support the re-mapping exercise for all the child tables.   Of course, your application will have to stop generating GUID values and instead generate BIGINT values.   A sequence object can help with that.

    I Agree. Thank you.

  • p.s.  While I hate the use of GUIDs (I simply don't like looking at them or typing them) and totally contrary to popular belief, it's actually quite easy to manage them and their indexes.  The fact that they're random (unless you're also making the mistake of using sequential GUIDs) doesn't actually matter much (in fact... they great for inserts because they prevent the "IDENTITY HOTSPOT" problem) if you have the INSERT/expAnsive UPDATE problem that I told you about above. 

    Seriously... get back to me with the stuff I asked for before you good folks go making changes that may not help at all.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, May 30, 2018 3:01 PM

    vsamantha35 - Wednesday, May 30, 2018 5:43 AM

    Hi All,

    Looking for some suggestions.

    Currently, In our system we have GUIDs are primary keys and as the database is keep growing, seeing serious index fragmentation issues. Thats the why initial development happened using .net Entity Framework model and generating a GUID in front end and storing in the database. The other part is, there are lot of child tables which are
    referencing these GUIDs. Now the business wanted to get rid of the GUIDs based primary key and clustered index.
    So, they came up with a workaround as below.

    - Introduced a new column as BIGINT column.
    - Made the bigint column as NOT NULL IDENTITY(1,1) with clustered index on it
    - Retained the Primary Key constraint on the GUID column.

    Note: the searching is still based based on GUIDs.

    Now, I want to know if there is any problem with such approach? Does it eliminate the index fragmentation? Any performance issues in this approach.
    From my point of view, I see still see the record size,tablesize,backup size etc... overhead of maintaining those GUIDs.
    How can we completely get rid of these GUIDs.

    Thanks,

    Sam

    There are a lot of suggestions in this discussion.  One of the more important aspects is that changing to use and IDENTITY column may do absolutely NOTHING to help fragmentation.  It doesn't matter what the PK is, what the key of any index is, nor whether or not such key are "ever-increasing" or not if you INSERT into the table and then later update ANY variable width column from NULL or Empty String to something else or you update a variable width column with larger data.

    I need to know if that's the case with your table and, do yourself a favor, don't make any changes to what you have until we hammer this out because, again, such changes may be useless when it comes to page splits, fragmentation, and performance of your system.  Even an "ever increasing" Clustered Index will add to your problems.  We can, in fact, make it so that there will be no page splits of a GUID based index (Clustered or not) for up to 8 weeks at a time (depending on the width of the index) with just a couple of incredibly simple changes to your index maintenance settings.

    I can also guarantee you that if you're doing nothing for fragmentation until you reach 10% fragmentation and then doing REORGANIZE for 10% to 30% and then doing a REBUILD if you happen to trip across the 30% mark whether or not a table/index is an Insert/Update table or not, that you're actually perpetuating page splits, perpetuating fragmentation, perpetuating performance and blocking problems that you don't even know you have and, in a highly ironic "Catch 22", you're perpetuating the need for Index Maintenance whether its a GUID index or not.

    My recommendation is to post the CREATE TABLE code for the table in question along with ALL indexes/constraints so I can have a look at it.  I also need to know if the table is suffering from the INSERT/expAnsive UPDATE problem that I told you about above.

    Hi Jeff,
    Thanks for taking time and providing insights. thanks.

    My recommendation is to post the CREATE TABLE code for the table in question along with ALL indexes/constraints so I can have a look at it.
      -- I can get you the information.
    I also need to know if the table is suffering from the INSERT/expAnsive UPDATE problem that I told you about above.       
       --   how can we validate this? what u r saying , if that table is having more INSERTS or more UPDATES??? Is there any DMV to pull this information out?

  • vsamantha35 - Wednesday, May 30, 2018 3:14 PM

    Hi Jeff,
    Thanks for taking time and providing insights. thanks.

    My recommendation is to post the CREATE TABLE code for the table in question along with ALL indexes/constraints so I can have a look at it.
      -- I can get you the information.
    I also need to know if the table is suffering from the INSERT/expAnsive UPDATE problem that I told you about above.       
       --   how can we validate this? what u r saying , if that table is having more INSERTS or more UPDATES??? Is there any DMV to pull this information out?

    I might be able to SWAG IT by looking at the CREATE TABLE statement and the indexes.  Also, how many rows in your largest 4 tables?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, May 30, 2018 3:05 PM

    p.s.  While I hate the use of GUIDs (I simply don't like looking at them or typing them) and totally contrary to popular belief, it's actually quite easy to manage them and their indexes.  The fact that they're random (unless you're also making the mistake of using sequential GUIDs) doesn't actually matter much (in fact... they great for inserts because they prevent the "IDENTITY HOTSPOT" problem) if you have the INSERT/expAnsive UPDATE problem that I told you about above. 

    Seriously... get back to me with the stuff I asked for before you good folks go making changes that may not help at all.

    Out of curiosity why do you say "(unless you're also making the mistake of using sequential GUIDs)"

  • Whatever kind of identity you choose to use - it's the worst possible choice for a clustered key.

    The best candidate for a clustered index is a column (set of columns) which is usually used for range selection.
    I've never seen a query using something like "WHERE ID BETWEEN @FromID and @ToID"

    _____________
    Code for TallyGenerator

  • Jeff Moden - Wednesday, May 30, 2018 3:28 PM

    vsamantha35 - Wednesday, May 30, 2018 3:14 PM

    Hi Jeff,
    Thanks for taking time and providing insights. thanks.

    My recommendation is to post the CREATE TABLE code for the table in question along with ALL indexes/constraints so I can have a look at it.
      -- I can get you the information.
    I also need to know if the table is suffering from the INSERT/expAnsive UPDATE problem that I told you about above.       
       --   how can we validate this? what u r saying , if that table is having more INSERTS or more UPDATES??? Is there any DMV to pull this information out?

    I might be able to SWAG IT by looking at the CREATE TABLE statement and the indexes.  Also, how many rows in your largest 4 tables?

    Hi Jeff, Please find the attached tablescript. For sake for simplicity considering 1 parent table and 2 child tables which are GUIDs. 

    Also, on prod I already have existing data, how can I use an UPDATE statement and replacing the GUIDs with an int/bigint column value and same homework has to be done to all the child tables, how would be my update statement in parent child tables and permanantly get rid of GUIDs then what approach to be taken. Could you please give simple demo example with some few rows on how I can get rid of those GUIDS permanantly from parent and child table.

    Again, thanks everyone for the help so far.

  • Can you supply the DDL for the original tables, not the ones you have altered to add the newid column

Viewing 15 posts - 1 through 15 (of 58 total)

You must be logged in to reply to this topic. Login to reply