January 19, 2010 at 8:50 pm
Comments posted to this topic are about the item Retrieving Identities on set-based inserts
January 19, 2010 at 9:37 pm
I can see how this will return the set of inserted values. However, I am not sure that I understand just how we can match this set of inserted identity values to the rows we have just inserted. To elaborate - lets assume we are inserting two PERSON records. We have separate addresses for each PERSON. Both of the individuals are called "John Smith", with the same birth date. So how can we be sure that we have successfully matched address to the right JOHN SMITH? Or must we be able to uniquely identify the inserted rows to be able to successfully match them to the identities associated with the inserted row?
January 20, 2010 at 12:49 am
Good to see examples of the output clause.
The @@identity problem should be a no-problem, if people just remembered to use scope_identity() instead. It will return the correct identity value to you. No identities from triggers.
Just remember that using any kind of @@identity or scope_identity() will still require some RBAR on multi line inserts.
January 20, 2010 at 3:32 am
Paul Marriott-463518 (1/19/2010)
I can see how this will return the set of inserted values. However, I am not sure that I understand just how we can match this set of inserted identity values to the rows we have just inserted. To elaborate - lets assume we are inserting two PERSON records. We have separate addresses for each PERSON. Both of the individuals are called "John Smith", with the same birth date. So how can we be sure that we have successfully matched address to the right JOHN SMITH? Or must we be able to uniquely identify the inserted rows to be able to successfully match them to the identities associated with the inserted row?
There are a few options:
1) put enough columns in the output clause to uniquely identify the inserted rows. Can get a bit messy though, and if there's the possibliity two rows can be genuinely identical then you could end up with some very strange results
2) You can use order by on the select statement used to insert your top level data, which will guarantee your IDs match the row ordering you've chosen. Whether this is useful depends on the rest of your data
3) Add a rowguid column to the tables where you ened to do this, and add that to your output clause. That'll always do the trick
Kev
January 20, 2010 at 3:47 am
kll (1/20/2010)
Good to see examples of the output clause.The @@identity problem should be a no-problem, if people just remembered to use scope_identity() instead. It will return the correct identity value to you. No identities from triggers.
.
If you're using the output clause then you don't need to use either of those. I'd still use scope_identity for single row inserts though
Just remember that using any kind of @@identity or scope_identity() will still require some RBAR on multi line inserts.
Surely the whole point of this article is that you don't need to do any RBAR. I use this technique a lot. Best result was on a stored proc that inserted typically about 500 top level records, each consisting of several sub-hierarchies ranging from one to 4 levels deep. Execution time went down from about 20 minutes to a few seconds.
January 20, 2010 at 4:45 am
Hi %
kll (1/20/2010)
Just remember that using any kind of @@identity or scope_identity() will still require some RBAR on multi line inserts.
It's possible to use bulk insert operations if table containing the new rows to be inserted contains any kind of a surrogate id (or another uniqueness).
---========================================================
-- table definitions
-----------------------------------------------------------
-- destination table for new data to be inserted
DECLARE @Destination TABLE (
Id INT NOT NULL IDENTITY(1,1)
PRIMARY KEY CLUSTERED
,SomeInt INT
);
-----------------------------------------------------------
-- table containing all rows to be newly inserted into
-- destination table
-- IMPORTANT
-- the table containing the new data to be inserted requires
-- any kind of uniqueness (e.g. a surrogate id)
DECLARE @NewData TABLE (
SurrogateId INT NOT NULL
PRIMARY KEY CLUSTERED
,SomeInt INT
);
---========================================================
-- create some sample data
-----------------------------------------------------------
-- insert existing data into destination table
INSERT INTO @Destination
SELECT -777
UNION ALL SELECT -888;
-----------------------------------------------------------
-- new rows to be inserted into destination table
INSERT INTO @NewData
SELECT -1, 11
UNION ALL SELECT -2, 22;
---========================================================
-- bulk insert rows into destination table and catch new identities
-----------------------------------------------------------
-- variables to catch count of inserted rows and scope-identity
DECLARE
@LastIdentity INT
,@RowCount INT;
-----------------------------------------------------------
-- bulk insert new rows into destination table
-- IMPORTANT
-- to get the following operations work ensure to use an
-- ORDER BY in SELECT. To avoid problems due to paralellism
-- use MAXDOP 1 option what should be okay for an INSERT
INSERT INTO @Destination
SELECT SomeInt
FROM @NewData
ORDER BY SurrogateId
OPTION (MAXDOP 1);
-----------------------------------------------------------
-- catch the last created scope-identity and the count of
-- rows inserted
SELECT
@LastIdentity = SCOPE_IDENTITY()
,@RowCount = COUNT(*)
FROM @NewData;
-----------------------------------------------------------
-- use a CTE that creates a row number ordered by the
-- surrogate id of newly inserted rows
WITH cte AS (
SELECT
SurrogateId
,ROW_NUMBER() OVER (ORDER BY SurrogateId) RowNum
FROM @NewData
)
SELECT
*
-- calculate destination table IDENTITY values
,@LastIdentity - @RowCount + RowNum
FROM cte
Greets
Flo
January 20, 2010 at 7:44 am
I like the idea of using the OUTPUT clause but I have this bad feeling about the example. Perhaps there are requirements that justified doing things this way (insert all parent rows then insert all children), but I think this will cause problems with concurrency.
It seems we are always "fighting" against such requirements as "add all records to the database; rollback all changes for the batch if any errors". When the batch gets large, the locking may escalate to table locks. How large is a dynamic that is based on resources at the time.
I guess my point is to be aware that, if you can, try to make the unit of work one master and all its related child records.
January 20, 2010 at 8:46 am
This is a sidebar but....
I am struggling with how the CTE is referencing itself from within itself
This works but I need clarification on it. The CTE is referencing itself from within itself.
I did not realize this was possible.
Any clarification would be greatly appreciated.
;WITH mycte AS
(SELECT 100 AS seedValue
UNION ALL
SELECT seedValue + 1 FROM mycte WHERE seedValue + 1 < 200
)
select * from mycte
January 20, 2010 at 9:03 am
bobmcc-957585 (1/20/2010)
This is a sidebar but....I am struggling with how the CTE is referencing itself from within itself
This works but I need clarification on it. The CTE is referencing itself from within itself.
I did not realize this was possible.
Any clarification would be greatly appreciated.
;WITH mycte AS
(SELECT 100 AS seedValue
UNION ALL
SELECT seedValue + 1 FROM mycte WHERE seedValue + 1 < 200
)
select * from mycte
Hi Bob
This is a recursive CTE, which is a supported structure in TSQL
One nice use for it is to navigate a parent-child relationship of uncertain depth. MSDN has an article on it at http://msdn.microsoft.com/en-us/library/ms186243.aspx
January 20, 2010 at 9:17 am
Thanks J!
I'm on my way to check out the article you recommended. I appreciate the reply.
..bob
Hi Bob
This is a recursive CTE, which is a supported structure in TSQL
One nice use for it is to navigate a parent-child relationship of uncertain depth. MSDN has an article on it at http://msdn.microsoft.com/en-us/library/ms186243.aspx
January 20, 2010 at 9:50 am
Great Article J.
I had no idea about recursive CTE's
Thanks for pointing me in the right direction!
January 20, 2010 at 10:06 am
A couple of notes on Florian's post:
The sample script works because you're using Table variables, which are local to the scope. If you use real tables, it is not guaranteed to work, as pointed out in the article, because the IDs of your insert are not guaranteed to be in order or consecutive, so you cannot calculate the IDs of the other rows from the ID of the last row.
Be careful to not confuse an insert of many rows in a single command with a BULK INSERT
http://msdn.microsoft.com/en-us/library/ms188365.aspx
Dan Guzman - Not the MVP (7/22/2010)
All questions have to be prefaced by Server version and 'according to MS Docs' or 'my own personal opinion based on how much detail I felt like digging into at the time.'
January 20, 2010 at 10:20 am
Hi weitzera
weitzera (1/20/2010)
The sample script works because you're using Table variables, which are local to the scope. If you use real tables, it is not guaranteed to work, as pointed out in the article, because the IDs of your insert are not guaranteed to be in order or consecutive, so you cannot calculate the IDs of the other rows from the ID of the last row.
Be careful to not confuse an insert of many rows in a single command with a BULK INSERT
http://msdn.microsoft.com/en-us/library/ms188365.aspx%5B/quote%5D
Unfortunately, I don't have any resource which guarantees this works or doesn't work. Just can say I tried this in several combinations with real tables, different orders and up to 50 concurrent processes. It worked in all my tests. (I don't have a complete test environment at the moment.)
Greets
Flo
January 20, 2010 at 11:28 am
Great tip, however, it is important to keep in mind the warning from BOL:
An UPDATE, INSERT, or DELETE statement that has an OUTPUT clause will return rows to the client even if the statement encounters errors and is rolled back. The result should not be used if any error occurs when you run the statement.
January 20, 2010 at 12:52 pm
So I reread what I wrote, and there's actually two issues. The most important one actually has nothing to do with using tables or table variables (you handle this correctly).
Depending on the query plan used, the identity values assigned to rows you insert may or may not be in order, as described in this article:
http://support.microsoft.com/?kbid=273586
You have handled this case by using the maxdop hint and explicitly ordering your rows on each insert.
The second issue is that identity values are assigned outside of the transaction, so within a given transaction the identities are not guaranteed to be sequential if there's ever any data modification against the table. This won't occur if you're using table variables, but can if you're using tables that other queries have access to.
Try running the following queries simultaneously:
Setup
CREATE TABLE foo (fooID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, data1 TIMESTAMP, data2 INT, data3 CHAR(250))
GO
INSERT INTO foo(data2, data3)
SELECT CHECKSUM(name), name
FROM syscolumns
UNION ALL
SELECT CHECKSUM(name+'foo'), name+'foo'
FROM syscolumns
first connection:
SET IMPLICIT_TRANSACTIONS OFF
SET LOCK_TIMEOUT 5
GO
DECLARE @maxID BIGINT, @minID BIGINT
SELECT @maxID = MAX(id), @minID = MIN(ID) FROM sysobjects
INSERT INTO foo (data2, data3)
SELECT TOP(1) CHECKSUM(id), name
FROM sysobjects WHERE id >= (RAND()*(@maxID-@minID))+@minID
GO 20000
Second connection:
DECLARE @sid AS INT;
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (ORDER BY name) AS row,
name
FROM sys.columns
)
INSERT INTO foo (data2, data3)
SELECT row, 'foo123'+name
FROM cte
ORDER BY row DESC
OPTION(MAXDOP 1)
SELECT @sid = SCOPE_IDENTITY()
SELECT fooid, fooid + data2 - @sid, data2
FROM foo
WHERE data3 LIKE 'foo123%'
AND fooid + data2 - @sid <> 1
If you run the second query by itself, it'll return no rows. When I started the first query, and immediately started the second before the first finished, I get identities that no longer can be computed from the scope identity.
Dan Guzman - Not the MVP (7/22/2010)
All questions have to be prefaced by Server version and 'according to MS Docs' or 'my own personal opinion based on how much detail I felt like digging into at the time.'
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply