Issue with Identity Specification on Primary Key column...table has millions of rows

  • Hello everyone! Long time no talk - long story for another time...

    Have a customer who has a database in SQL Server 2005 Standard Edition (SP3) and they are experiencing a great deal of difficulty with one of their tables, that has begun to receive millions of rows of data to this one table, a day (when I say millions I mean a million plus, but not more than 2 million rows a day).

    The issue is that the column is an ID, Pkey column with an INT value. They are having to reseed often because they are running out of numbers, and for some reason (and this is the main problem) they are experiencing skips in the ID numbers being issued (checked incrementing and all that - all seems to be correct there). I have suggested the following things:

    - Change the column data type from INT to BIGINT, and start seeding over from 1 in new server.

    - Range Partitioning for the table by year/quarter/month (still being decided on)

    - Check .NET factories to make sure the isolation level default (READ COMMITTED) has not been compromised.

    They are having me get things in order to move this to a new server, and I want to really give it the best chance possible to not only thrive in performance, but to eliminate this issue. Any and all help is appreciated. Please ask any questions necessary to query me on this scenario, and I cannot thank you all enough!

    Rich

  • Gaps in a column with the identity property set usually means 1 of 2 things:

    1. Deletes

    2. Failed inserts

    So I'd check how many deletes are being done and how many insert failures that they have. If there's over a million inserts a day then I'd be there are plenty of failed inserts.

    I'd change to BIGINT for sure. If you are moving to new box are you upgrading to 2012? You might want to investigate using a SEQUENCE instead of IDENTITY.

  • Jack, thank you for the response. The new system is still on SQL 2005, but will be Enterprise Edition instead of Standard Edition. Also - I am trying to see if we cannot go with SSD's for the RAID array. The unfortunate thing for me with this system is that it is eventually going to be phased out in like 12/24 months for a MySQL backend, but that's not really important now.

    What would you recommend as my best way to see if in fact I am getting fail inserts and deletes? The previous DBA to me has set up some jobs that will do a Trace every hour for 1 minute, but in my initial review of those captures - I did not see anything along the lines of deletes or anything to indicate failed inserts - then again, I may not be looking at the data captured correctly.

    Any other advice would be much appreciated!

    Rich

  • how much of an impact would it be to change the column to big_int?

    if there's an application that consumes the data, that would need to change to use int64, but that'd be the way i lean for fixing it.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Rich Yarger (4/12/2012)


    Jack, thank you for the response. The new system is still on SQL 2005, but will be Enterprise Edition instead of Standard Edition. Also - I am trying to see if we cannot go with SSD's for the RAID array. The unfortunate thing for me with this system is that it is eventually going to be phased out in like 12/24 months for a MySQL backend, but that's not really important now.

    What would you recommend as my best way to see if in fact I am getting fail inserts and deletes? The previous DBA to me has set up some jobs that will do a Trace every hour for 1 minute, but in my initial review of those captures - I did not see anything along the lines of deletes or anything to indicate failed inserts - then again, I may not be looking at the data captured correctly.

    Any other advice would be much appreciated!

    Rich

    Keep in mind that sometimes "failed inserts" represents "user started something, and then cancelled out" which would not show up on a database log. If you reserve the ID's while you are processing, the identity seed will increment regardless of whether you decide to ACTUALLY insert.

    To get a full picture you'd have to see how to capture how many times someone cancels out of a screen (or cancels with a dirty data set not committed).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I just received some information from one of my colleagues that is looking at the ID skips with me, and he has indicated that he does not believe that Jack's initial assessment fits our issue. He has explained that as far as he knows, we do not do any deletes from the table at all, and that the skips in ID's are in the millions, so it would be hard to imagine millions of failed inserts taking place through the system.

    The other issue I am dealing with is the way the proxy classes that the programmers has set things up with are only set to INT, and not BIGINT. I'm getting a lot of push back over changing the data type because they do not want to change it in their code.

    So - what would you recommend now?

  • Rich Yarger (4/11/2012)


    ...their tables, that has begun to receive millions of rows of data to this one table, a day (when I say millions I mean a million plus, but not more than 2 million rows a day)..

    Rich

    if you are seeing gaps in the millions, and they are getting a million+ rows a day, it sounds like the ETL process is starting a big transaction, and rolling back/erroring without it being obvious.

    if you are inserting 1M+ at a time, isn' t that going to limit you to 2000 inserts before you hit the upper limite of an int datatype anyway?

    i think you might have no choice but to have the developers start switching it to BIINT/Int64 support.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Rich Yarger (4/12/2012)


    I just received some information from one of my colleagues that is looking at the ID skips with me, and he has indicated that he does not believe that Jack's initial assessment fits our issue. He has explained that as far as he knows, we do not do any deletes from the table at all, and that the skips in ID's are in the millions, so it would be hard to imagine millions of failed inserts taking place through the system.

    The other issue I am dealing with is the way the proxy classes that the programmers has set things up with are only set to INT, and not BIGINT. I'm getting a lot of push back over changing the data type because they do not want to change it in their code.

    So - what would you recommend now?

    Okay, so I provided the most common ways I've seen that gaps appear in columns using the IDENTITY property. Here are the other ways I know of that would create gaps:

      1. Setting the identity to start somewhere other than 1 either at creation or using DBCC CHECKIDENT with the RESEED option.

      2. Setting the increment to something other than 1 which I don't know how to change once it is set without recreating the column, so it would still be doing that.

      3. Explicitly entering data into the column using the IDENTITY property using SET IDENTITY_INSERT table_name ON.

    Of all 5 ways (including the 2 earlier) I still think the 2 most likely are FAILED/ABORTED inserts or using SET IDENTITY_INSERT ON are the 2 most likely ones.

    Maybe someone else knows another way for gaps to appear in a column with the IDENTITY property set.

    As far as checking for failed transactions you can use SQL Trace and collect these events:

      RPC:Starting and RPC:Completed Events -- if using SP's or SQL Commands properly

      SQL:BatchStarting and SQL:BatchCompleted -- if not using SP's

      SQLTransaction - to catch implicit transactions. You'll want the Event Subclass column as that will tell you BEGIN, COMMIT, or ROLLBACK

      TM: Rollback Tran - to catch explicit transactions and rollbacks.

    Then you'd want to filter on Database Name and probably Application Name to limit the data collected.

    If the data is being inserted in sets then a single insert that is inserting a million rows would create a million row gap.

  • Hi Jack - thank you (and all of you) for your input on this. I appreciate it, and apologize if I came off at all in an unappreciative way. I am only on day 4 here at this client's site, so I am going to gather some more details on how this is happening, and see if after we install the new environment if it still occurs from a fresh start. I will take all the considerations for tracing and what will and will not show up there into consideration, and get back to this thread again in about 12/24 hours (big database backup that is going to take some time to get restore).

    Thanks again everyone!

    Rich

  • You didn't come across as unappreciative. I just don't believe your colleague because there aren't a whole lot of ways to get gaps in a column with the IDENTITY property and really DELETEs and failed INSERTs are the most common.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply