How to overcome the identity column "jump 1000" issue

  • Use the SQL start up proc to adjust the identity value back to a lesser-gap value.  You can do a MAX($IDENTITY) to get the highest current value, add 1 (or whatever other value you want), and then RESEED the identity to that value.  That would eliminate whatever part of the gap you didn't want.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Grant Fritchey - Tuesday, March 7, 2017 2:15 AM

    Jeff Moden - Monday, March 6, 2017 9:12 PM

    spaghettidba - Thursday, January 15, 2015 4:40 AM

    It's a known issue. The TF resolves it, but IMHO a gap or jump in identity column should not be a big deal. Is this really an issue for you? If you are exposing your identity value to the end users, you're looking for trouble. Identity is a surrogate key, and as a surrogate it should never be shown to end users in any way. Use business (natural) keys for this.

    I have to challenge that advice, ol' friend.  What "natural" key would you use for a Customer, Employee table, or Invoice table, just to name a few?

    And, yeah, I know it's an old post but it was just necro'd and this is the first time I've seen it.

    You're still going to want some kind of constraint on the table on top of an identity column (presumably as the pk) to ensure business level uniqueness. This is regardless of if you expose an identity to users or not. Even if you do expose those numbers to the users, does it matter if there are gaps?

    Yep.  I get that.  Tell me what that constraint should be for a table with a million customers.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, March 7, 2017 1:00 PM

    Grant Fritchey - Tuesday, March 7, 2017 2:15 AM

    Jeff Moden - Monday, March 6, 2017 9:12 PM

    spaghettidba - Thursday, January 15, 2015 4:40 AM

    It's a known issue. The TF resolves it, but IMHO a gap or jump in identity column should not be a big deal. Is this really an issue for you? If you are exposing your identity value to the end users, you're looking for trouble. Identity is a surrogate key, and as a surrogate it should never be shown to end users in any way. Use business (natural) keys for this.

    I have to challenge that advice, ol' friend.  What "natural" key would you use for a Customer, Employee table, or Invoice table, just to name a few?

    And, yeah, I know it's an old post but it was just necro'd and this is the first time I've seen it.

    You're still going to want some kind of constraint on the table on top of an identity column (presumably as the pk) to ensure business level uniqueness. This is regardless of if you expose an identity to users or not. Even if you do expose those numbers to the users, does it matter if there are gaps?

    Yep.  I get that.  Tell me what that constraint should be for a table with a million customers.

    Last name of course.

    But seriously, it depends on the data. In order to keep the data clean, natural constraints must be put in place as well as artificial ones. I'm not turning into Joe here. I just care about keeping the data as clean as possible.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Jeff Moden - Tuesday, March 7, 2017 1:00 PM

    Yep.  I get that.  Tell me what that constraint should be for a table with a million customers.

    Depends on the system and the client's requirements. I'd start answering that question by having a long conversation with the client about how they identify a customer.
    In South Africa it's common to use the government-issued ID number, because every legal resident has one.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Driving further off into the weeds ....

    Why is it that so many businesses seem to think that an email address or phone number is immutable and therefore is a good candidate for identifying a customer/client/person/etc.?

    First off, they are often shared, especially when people have a land-line and no cell phone number.  And then they get changed, as when that family switches from their land line to multiple cell phones.  Or when someone changes from an email address supplied by an internet provider to something like gmail, because they are changing providers.

    I had to make a list of accounts that I had, and which ones used my new email and which were stuck using my old email.

  • GilaMonster - Wednesday, March 8, 2017 12:16 AM

    Jeff Moden - Tuesday, March 7, 2017 1:00 PM

    Yep.  I get that.  Tell me what that constraint should be for a table with a million customers.

    Depends on the system and the client's requirements. I'd start answering that question by having a long conversation with the client about how they identify a customer.
    In South Africa it's common to use the government-issued ID number, because every legal resident has one.

    If someone isn't a resident but still need to appear in the table, what do they use?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GilaMonster - Wednesday, March 8, 2017 12:16 AM

    Depends on the system and the client's requirements.

    I didn't say we always use the ID number without exception, in all circumstances, even when we need non-residents. I said it's common to use it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Something that could make these gaps matter more is a smallint column datatype.

  • GilaMonster - Wednesday, March 8, 2017 3:07 PM

    GilaMonster - Wednesday, March 8, 2017 12:16 AM

    Depends on the system and the client's requirements.

    I didn't say we always use the ID number without exception, in all circumstances, even when we need non-residents. I said it's common to use it.

    Understood and I said nothing contrary to that.  The questions remains, though.  What do you use when you don't use the ID number?  What I'm specifically looking for is the answer to the age old question of what someone would use for a non-incremental key on something like a "Customer" table.  Has any client of yours made such a specification?  Not a challenge here, ol' friend.  Looking for info from someone that has gotten around more than I.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Grant Fritchey - Tuesday, March 7, 2017 11:24 PM

    Jeff Moden - Tuesday, March 7, 2017 1:00 PM

    Grant Fritchey - Tuesday, March 7, 2017 2:15 AM

    Jeff Moden - Monday, March 6, 2017 9:12 PM

    spaghettidba - Thursday, January 15, 2015 4:40 AM

    It's a known issue. The TF resolves it, but IMHO a gap or jump in identity column should not be a big deal. Is this really an issue for you? If you are exposing your identity value to the end users, you're looking for trouble. Identity is a surrogate key, and as a surrogate it should never be shown to end users in any way. Use business (natural) keys for this.

    I have to challenge that advice, ol' friend.  What "natural" key would you use for a Customer, Employee table, or Invoice table, just to name a few?

    And, yeah, I know it's an old post but it was just necro'd and this is the first time I've seen it.

    You're still going to want some kind of constraint on the table on top of an identity column (presumably as the pk) to ensure business level uniqueness. This is regardless of if you expose an identity to users or not. Even if you do expose those numbers to the users, does it matter if there are gaps?

    Yep.  I get that.  Tell me what that constraint should be for a table with a million customers.

    Last name of course.

    But seriously, it depends on the data. In order to keep the data clean, natural constraints must be put in place as well as artificial ones. I'm not turning into Joe here. I just care about keeping the data as clean as possible.

    I absolutely agree but what would those natural constraints be for something like a Customer table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, March 8, 2017 5:02 PM

    Understood and I said nothing contrary to that.  The questions remains, though.  What do you use when you don't use the ID number?

    Shall I repeat myself again? "Depends on the system and the client's requirements. "

    What I'm specifically looking for is the answer to the age old question of what someone would use for a non-incremental key on something like a "Customer" table.

    "Depends on the system and the client's requirements".

    I use identities for primary keys, but like Grant says, there should be another unique column for data integrity, and what it is will come out of a discussion on how the client identifies a customer

    I had this discussion with my grad class recently, and one thing that came up is that there doesn't have to be a one-one mapping between a customer and a person. My parents, for example, share an Amazon account. Amazon considers them one customer (email address being their identifier) and likely don't care at all that there are two real people behind it.

    A recent system I was involved with used GPS coordinate to uniquely identify a customer, it was for farm management. One person owns multiple farms?  Still considered multiple customers.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jeff Moden - Wednesday, March 8, 2017 8:00 PM

    Grant Fritchey - Tuesday, March 7, 2017 11:24 PM

    Last name of course.

    But seriously, it depends on the data. In order to keep the data clean, natural constraints must be put in place as well as artificial ones. I'm not turning into Joe here. I just care about keeping the data as clean as possible.

    I absolutely agree but what would those natural constraints be for something like a Customer table?

    The only way I can answer that is to start making some assumptions. Let's say the Customers we're talking about are businesses. Let's also assume that we have chosen, because more than one business can occupy a single address, that we're going to create an address table and that the customer table will have the AddressID (because, I'm fine with using ID columns). I'd make the natural key, the customer name and the address. That's a whole series of assumptions of the type of business, data, etc., but, to me, it's a question of identifying what we can, where we can, that will allow us to enforce the natural uniqueness too. There is always something that identifies that. We may just have to work hard to find it sometimes.

    For a Customer that's a person, it could be firstname, lastname, phonenumber, cause those sure aren't ever going to be duplicate. Firstname, lastname, email could also work. You can't depend on email alone because people might share it (I know a family that set up a single email address for everyone, what  PITA to communicate with the five of them), but adding on the names and suddenly it's unique.

    I've built systems where I only  had the ID columns on every table. I'm that guy. Hopefully, I've learned better now. While I may not want to use a natural key as the PK on a table and it might be a wretched bad choice for the clustered index, I'm still going to have that constraint there to ensure I have clean data because I've done the dirty data thing and sincerely regret it.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Raymond van Laake - Tuesday, March 7, 2017 2:27 AM

    For me it matters that there are gaps.... in our setup the identity is used as the employee number, and the employees use it to log in. Because MS suddenly introduced gaps I get questions from my customers what is going on. So it is not a showstopper, but an ugly "feature". It would have been nice if MS had come up with someting else.

    I can see how an accountant, a data analyst, or a CIO with Aspbergers might be unnerved by the sudden presence of "missing" IDs in a financial report. For that reason, it may be a good idea to design the system from start so that identity columns are routinely padded with a random number of gaps , just so no one cultivates an expectation for perfect sequentiality.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell - Thursday, March 9, 2017 7:10 AM

    I can see how an accountant, a data analyst, or a CIO with Aspbergers might be unnerved by the sudden presence of "missing" IDs in a financial report. For that reason, it may be a good idea to design the system from start so that identity columns are routinely padded with a random number of gaps , just so no one cultivates an expectation for perfect sequentiality.

    Intentionally wasting numbers just to cover bad design, really?

    There are still people arguying shat sequence where 1% numbers are gaps and where 98% are gaps are basically the same. They are basically the same, but effectively different.
    - Estimates of heuristics based on ranges go crazy when data density change by factor of 50x for part of the sequence. SQL server statistics are probably advanced enough to suck it up in most cases, but there is a lot more software out there in the world. And a lot of software simply can not be rewritten.
    - If You find out that You could change data type from int to small int, but You actually can not, because the sequence is two orders of magnitude inflated due to voids, it is not nice, You would need twice as much storage forever (500MB more in my case). Efficiency of Yout data storage is affected.
    - Now You can not use identity at tinyint at all, because server would eat up all numbers itselt. Using identity on smallint is badly affected in therms of available address space.

    The worst thing is, that this change is breaking unnecessarily, there are at least two easy ways to make it without significant negative impact. Firstly identity caching can be postponend until table has enought rows that it does not matter that much, for 1M rows table the skipping of 1k numbers is not big deal as it would not mean significant change in data distribution. Secondly identity can be automatically restored to the right value after server restart.

  • alejsek - Thursday, March 9, 2017 7:50 AM

    Eric M Russell - Thursday, March 9, 2017 7:10 AM

    I can see how an accountant, a data analyst, or a CIO with Aspbergers might be unnerved by the sudden presence of "missing" IDs in a financial report. For that reason, it may be a good idea to design the system from start so that identity columns are routinely padded with a random number of gaps , just so no one cultivates an expectation for perfect sequentiality.

    Intentionally wasting numbers just to cover bad design, really?

    There are still people arguying shat sequence where 1% numbers are gaps and where 98% are gaps are basically the same. They are basically the same, but effectively different.
    - Estimates of heuristics based on ranges go crazy when data density change by factor of 50x for part of the sequence. SQL server statistics are probably advanced enough to suck it up in most cases, but there is a lot more software out there in the world. And a lot of software simply can not be rewritten.
    - If You find out that You could change data type from int to small int, but You actually can not, because the sequence is two orders of magnitude inflated due to voids, it is not nice, You would need twice as much storage forever (500MB more in my case). Efficiency of Yout data storage is affected.
    - Now You can not use identity at tinyint at all, because server would eat up all numbers itselt. Using identity on smallint is badly affected in therms of available address space.

    The worst thing is, that this change is breaking unnecessarily, there are at least two easy ways to make it without significant negative impact. Firstly identity caching can be postponend until table has enought rows that it does not matter that much, for 1M rows table the skipping of 1k numbers is not big deal as it would not mean significant change in data distribution. Secondly identity can be automatically restored to the right value after server restart.

    If the difference in your case between storing an int and a smallint is 500MB you have too many rows for a smallint to be unique anyway so that argument doesn't hold water. If you are really worried about gaps and micromanaging values like this you should use a sequence instead of an identity.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 15 posts - 16 through 30 (of 50 total)

You must be logged in to reply to this topic. Login to reply