Identity Column in SQL Server - Transaction - MultiUSer

  • I have based some core logic on a webpage based on inserts into a table whose rows are identified by a column ID (int, Identity increment 1). The insert query runs in a transaction.

    My understanding is that in a multiuser environment, the stored procedure executed by a particular user would always have a set of Id's that would be contiguous. Here is what I mean-

    User 1 and User 2 hit the page simultaneously and insert 5 records each that run in its own transaction scope. The set of Id's returned to user1 would be 1 through 5 and user 2 from 6-11 (this is my understanding).

    Is there a possibility that the user1 would have records with Id 1, 3, 4, 6, 7 and user2 with Ids- 2,5,8,9,10? (overlapping Id's)

    All my tests have proved otherwise, that is the series of numbers are always a sequence. But many members in my team think that it can overlap in a multi-user environment. I am not convinced. Is there a way to prove either way?

    Table (Test_Table) structure

    id int Identity (1,1) PK

    Title varchar (50)

    SessionID varchar(50)

    Query

    BEGIN TRAN A

    INSERT INTO TEST_TABLE

    SELECT substring(TITLE,1,20), 'SessionId' FROM DBO.AnotherTestTable

    COMMIT TRAN A

    Note: I have also posted this question on other sites because of the urgency. Thanks.

  • I believe that to guarantee it you might need a table lock. Your code implies that the 5 records are inserted in a single statement... I've got a test case below that shows you need a table lock in the case where the records are not all inserted in a single statement...

    Open two query windows in SSMS. Switch both to TEMPDB.

    In the first window execute

    create table test_table(

    id int identity(1,1) primary key clustered,

    title varchar(50),

    sessionID varchar(50)

    )

    begin transaction

    insert into test_table

    select 'Insert #1', 'Session #1'

    UNION ALL

    select 'Insert #2', 'Session #1'

    waitfor DELAY '00:00:10'

    insert into test_table

    select 'Insert #3', 'Session #1'

    UNION ALL

    select 'Insert #4', 'Session #1'

    UNION ALL

    select 'Insert #5', 'Session #1'

    commit transaction

    Whilst that's executing (you've got 10 seconds delay in the middle) execute the following in the second window

    begin transaction

    insert into test_table

    select 'Insert #1', 'Session #2'

    UNION ALL

    select 'Insert #2', 'Session #2'

    waitfor DELAY '00:00:10'

    insert into test_table

    select 'Insert #3', 'Session #2'

    UNION ALL

    select 'Insert #4', 'Session #2'

    UNION ALL

    select 'Insert #5', 'Session #2'

    commit transaction

    select * from test_table order by id

    drop table test_table

    You'll see, at least I did on my system, that the ranges are not contiguous.

    However if you add a

    with(tablock)

    after the table names in the insert statements then you serialize access to the table and all is well.

    In short, I'm not sure if you can rely on the behaviour of the query engine but I haven't proven it with a single insert statement. Perhaps create a test case that takes a while to run such as inserting 100000 records in two simultaneous query windows?

  • Actually, I surprised myself and bothered to do it 🙂

    Try this - same as before in tempdb with two query windows...

    Window #1

    create table test_table(

    id int identity(1,1) primary key clustered,

    num int,

    sessionID int

    )

    insert into test_table( num, sessionID )

    --select number, 1 from DoxKPH.dbo.numbers cross join numbers

    select top 1000000 c1.id, 1 from syscolumns c1 cross join syscolumns c2 cross join syscolumns c3

    Window #2

    insert into test_table( num, sessionID )

    select top 1000000 c1.id, 2 from syscolumns c1 cross join syscolumns c2 cross join syscolumns c3

    select sessionID, min(id) from test_table group by sessionID order by sessionID

    drop table test_table

    Each inserts a million rows into test_table. The minimum ID of each insert is then retrieved. The results on my machine were

    sessionID Min Identity

    1 1

    2 436254

    If your hypothesis were correct the Min Identity for session #2 should have been 1000000, not 436254.

    Sorry to say you cannot rely on the behaviour of contiguous identity values within a single insert on a busy table. 🙁 Good question though!!! 😀

  • Another issue you need to watch out for, if your requirement is a contiguous sequence of values in the identity column with no gaps in between:

    The insertion of a new value in the identity column is NOT part of the transaction!

    If the transaction fails for some reason, the new value will still hold, but obviously will not appear in the table. That will cause you to end up with gaps in the sequence.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Hi Ian,

    No matter what I do, I am always getting a sequential Id. The Id's do not overlap between connections. In your last post you are not running the inserts in a transaction.

    Here is what I did -

    Con1

    begin transaction

    insert into test_table( num, sessionID )

    select top 1000000 c1.id, 1 from syscolumns c1

    cross join syscolumns c2 cross join syscolumns c3

    waitfor DELAY '00:00:02'

    insert into test_table( num, sessionID )

    select top 1000000 c1.id, 1 from syscolumns c1

    cross join syscolumns c2 cross join syscolumns c3

    waitfor DELAY '00:00:02'

    insert into test_table( num, sessionID )

    select top 1000000 c1.id, 1 from syscolumns c1

    cross join syscolumns c2 cross join syscolumns c3

    waitfor DELAY '00:00:02'

    insert into test_table( num, sessionID )

    select top 1000000 c1.id, 1 from syscolumns c1

    cross join syscolumns c2 cross join syscolumns c3

    commit transaction

    Con2

    begin transaction

    insert into test_table( num, sessionID )

    select top 1000000 c1.id, 2 from syscolumns c1

    cross join syscolumns c2 cross join syscolumns c3

    insert into test_table( num, sessionID )

    select top 1000000 c1.id, 2 from syscolumns c1

    cross join syscolumns c2 cross join syscolumns c3

    insert into test_table( num, sessionID )

    select top 1000000 c1.id, 2 from syscolumns c1

    cross join syscolumns c2 cross join syscolumns c3

    commit transaction

    Query results

    select sessionID, min(id) as StartIndex, max(id) as EndIndex

    from test_table group by sessionID order by sessionID

    Result

    114000000

    240000017000000

  • From "INSIDE MICROSOFT SQL SERVER 2005 - T-SQL QUERYING" by Itzik Ben-Gan, Solid Quality Learning, pg 429:

    Another aspect of the IDENTITY property that can be considered a limitation in some cases is that identity values are assigned in an asynchronous manner. This means that multiple sessions issuing multirow inserts might end up getting nonsequential identity values.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Marios Philippopoulos (2/17/2008)


    From "INSIDE MICROSOFT SQL SERVER 2005 - T-SQL QUERYING" by Itzik Ben-Gan, Solid Quality Learning, pg 429:

    Another aspect of the IDENTITY property that can be considered a limitation in some cases is that identity values are assigned in an asynchronous manner. This means that multiple sessions issuing multirow inserts might end up getting nonsequential identity values.

    That's nice... and, although it's a famous and very well respected author, without deomonstrable code to prove it one way or the other, it can still be wrong.

    For example, one very famous and very well respected author said that you don't need to preallocate any disk space when setting up a new database... that the autogrow features would take care of everything...

    ... it sure does... including 73 disk fragments just to get to the first Gig.

    And, how many times have you seen well known authors recommend the use of Table Variables because the "live in memory and Temp Tables don't", which of course, is incorrect.

    Extraordinary claims require extraordinary proof... show me the code 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... here's some extraordinary proof that supports what Itzek said...

    First, I made a table in a nice safe place... TempDB... couldn't use a actual temp table because I wanted it to be as normal as possible and I wanted multiple connections to be able to get to it (and didn't want to use a global temp, folks).

    --===== Identify the database to use (Tempdb is a nice safe place)

    USE TempDB

    --===== Create the test table

    CREATE TABLE DeleteMe

    (

    DeleteMeID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED WITH FILLFACTOR = 90,

    SomeStringData VARCHAR(10),

    Spid INT

    )

    Then, I opened 8 new windows in QA and entered/copied the following code to all 8 windows...

    --===== Identify the database to use (Tempdb is a nice safe place)

    USE TempDB

    WAITFOR TIME '11:27:00'

    INSERT INTO DeleteMe

    (SomeStringData,Spid)

    SELECT TOP 100

    'xxxxxx' AS SomeStringData,

    @@SPID

    FROM Master.dbo.SysColumns

    Basically, I have 8 connections that all entered 100 rows each at the same time. And, what Itzek said is true... you can't rely on the Identity being contiguous by connection... here's the proof of that I found in the DeleteMe table...

    DeleteMeID SomeStringData Spid

    ----------- -------------- -----------

    336 xxxxxx 69

    337 xxxxxx 69

    338 xxxxxx 69

    339 xxxxxx 69

    340 xxxxxx 69

    341 xxxxxx 67

    342 xxxxxx 71

    343 xxxxxx 72

    344 xxxxxx 64

    345 xxxxxx 68

    346 xxxxxx 70

    347 xxxxxx 69

    348 xxxxxx 65

    349 xxxxxx 65

    350 xxxxxx 65

    (15 row(s) affected)

    We could stop there because a single insert is an implicit transaction. But, what the heck... I modified the code in the 8 windows to do an explicit transaction, just to make sure...

    --===== Identify the database to use (Tempdb is a nice safe place)

    USE TempDB

    WAITFOR TIME '11:40:00'

    BEGIN TRANSACTION

    INSERT INTO DeleteMe

    (SomeStringData,Spid)

    SELECT TOP 100

    'xxxxxx' AS SomeStringData,

    @@SPID

    FROM Master.dbo.SysColumns

    COMMIT

    ... truncated the DeleteMe table and, again, the 8 scripts all ran at the same time... and, here's what I got...

    DeleteMeID SomeStringData Spid

    ----------- -------------- -----------

    336 xxxxxx 65

    337 xxxxxx 65

    338 xxxxxx 65

    339 xxxxxx 65

    340 xxxxxx 65

    341 xxxxxx 68

    342 xxxxxx 72

    343 xxxxxx 70

    344 xxxxxx 71

    345 xxxxxx 64

    346 xxxxxx 69

    347 xxxxxx 69

    348 xxxxxx 69

    349 xxxxxx 69

    350 xxxxxx 69

    (15 row(s) affected)

    So, NOW we can say, with some great conviction ('cause we're seen it, it's no longer someone's word), that Itzek is right, in this case... whether or not you use an implicit or explicit transaction, there is no guarantee that Identities will be contiguous for any given insert.

    Sorry for all the hoo-hah about trusting authors... I've just been burned to many times by authors that have been wrong.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Point well taken... 🙂

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Jeff, the requirement in my case is not to have contiguous ID's but to be 100% sure that there are no overlap of Id's between the two transactions.

    As an example - if for transaction 1, the id's are spread between 51-150 and transaction two between 151-250, I would stick to the logic I have built for the web page otherwise I would have to change it.

    The following is not acceptable -

    First Transaction Id's - 51-80, 100-169

    2nd Transaction Id's - 81-99 and 170-249

    I hope I am being clear, otherwise I would be happy to clarify. Thanks.

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE - What do you think of using the isolation level as SERIALIZABLE? In this case the set of Id's would be sequential.

    Any performance Issues I need to worry about using this?

  • atarafder (2/17/2008)


    Jeff, the requirement in my case is not to have contiguous ID's but to be 100% sure that there are no overlap of Id's between the two transactions.

    Um... same thing, I believe... Non-contiguous means there WILL be an overlap.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Is there a possibility that the user1 would have records with Id 1, 3, 4, 6, 7 and user2 with Ids- 2,5,8,9,10? (overlapping Id's)

    If the example you quote here is what you are worried about, then, yes, this sort of thing can and will happen, as it was shown with the examples.

    If by overlapping ids you mean that 2 sessions will accidentally hit the SAME identity value, then, no, this will not happen.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Hi Jeff

    I am a frequent lurker and occasional poster on this site. Firstly, I would like to thank you and others for the invaluable info that you post here - I have learned a lot.

    Having said that I have a potentially dumb question:

    you frequently refer to TempDB as a nice safe place to test code etc. Given that it is a system database,and is used by other databases on the server is it really a better place to test code than say a "test database" created specifically for the purpose of testing, where the "test database" could be dropped if necessary? I'm interested to learn the reasoning for this

    TIA

  • First, I love your "handle"... "Ivana Noh"... that's outstanding. Someday, ya gotta tell me at least your first name.

    Your observation is spot on... It would be MUCH better to have a test database... But, there's the rub... What is the name of your test database? And, if I need to create a real table and drop it in code, I don't want to take the chance that someone is working from a test database. I have had people who aren't fully aware of what they're doing actually run test code in a production environment. If I were to write code that created, used, and dropped a table called "Employee" of "Customer", and that type of uniformed steward ran the code, there's a pretty good chance that my code would mess up someones data.

    So, to protect the, shall we say, lesser informed folks, I run the code over on TempDB just to keep the chances of damaging someones data as low as I possiblly can even if they've made the mistake of logging in as "SA"... I know it's not my data and anyone who runs test code from a forum on a production database is just begging for trouble, but data is data and, as corny as it sounds, I'm sworn to protect it even if it's not mine.

    So, that's why... it's to protect folks who just might not know better. Worst they may have to do to get out of trouble is bounce the service so that TempDB will rebuild. And, folks who are aware, will simply change the USE statement to point to their correct data base instead of TempDB... or not. Either way, things are pretty safe that way.

    Thanks for the compliment and don't lurk so much... you ask good questions and probably have some good answers of your own... join the"crowd".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi guys,

    I know this is an old post, but I've got some more info.

    As you would expect setting isolation level to SERIALIZABLE or using HOLDLOCK or SERIALIZABLE hints on the INSERT statements does not help as the data that is being inserted in the different INSERT statements may be unrelated.

    TABLOCK hint causes deadlocks with the example code.

    The only way I found to totally isolate the inserts and avoid interlaced identity ids was to use a TABLOCKX hint on the INSERT statewent. Obviously this has huge repercussions for concurrency.

    Cheers,

    James

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply