September 10, 2010 at 4:18 pm
I've googled this and found plenty of preaching on avoiding cursors in small-scale applications, but I can't seem to find a relevant example on a set-based solution to insert multiple rows into multiple tables for each row in a result set.
For every result, I need exactly one entry in tblParent, and two rows in tblChild with a foreign key baack to tblParent. For the sake of argument the DDL is
CREATE TABLE tblParent (pkParentId bigint IDENTITY(1, 1), field1 int, field2 int, fkTempId int)
CREATE TABLE tblChild(pkChildId bigint IDENTITY(1, 1), fkParentId bigint, theVal varchar(3))
There is a foreign key relationship between tblParent and tblChild.
I have faced this challenge a few times over the course of my development cycle, and come up with several strategies. Initially, I added fkTempId as a permanent column to tblParent which was populated only during this type of insert. I created a temp id or rownumber in my result set and then something like this:
DECLARE @tbl TABLE (pkSourceId int IDENTITY(1,1), field1 int, field2 int, childval1 varchar(3), childval2 varchar(3))
UPDATE tblParent SET fkTempId = NULL
INSERT INTO @tbl(field1, field2, childVal1, childVal2)
SELECT field1, field2, childVal1, childVal2
FROM BigQuery
INSERT INTO tblParent(field1, field2, fkTempId)
SELECT field1, field2, pkSourceId
FROM @tbl
INSERT INTO tblChild (fkParentId, theVal)
SELECT p.pkParentId, t.childVal1
FROM tblParent p
INNER JOIN @tbl t ON p.fkTempId = t.pkSourceId
INSERT INTO tblChild (fkParentId, theVal)
SELECT p.pkParentId, t.childVal2
FROM tblParent p
INNER JOIN @tbl t ON p.fkTempId = t.pkSourceId
UPDATE tblParent SET fkTempId = NULL
So now the tables each have tens of millions of rows and this approach of locking the table and updating the tempId takes too long. I came up with a cursor based solution that does not require the fkTempId on tblParent, but I was hoping for some insight about a set-based approach to handle this. I don't see how to temporarily hold the identity of my new inserts into tblParent. NewId() looked promising, but I would have to make pretty big structural changes to implement anything with that. I'd like to keep my primary keys as bigint. I would greatly appreciate any assistance with this.
September 10, 2010 at 4:35 pm
Check out the OUTPUT clause of the INSERT statement (this is what it's there for!).
(well, maybe other things to, but definitely this.)
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
September 10, 2010 at 7:28 pm
Awesome! Thank you!
September 10, 2010 at 8:48 pm
No problem.
But... please follow up and let us know how it works out for you.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
September 11, 2010 at 5:35 am
Hi Wayne, I created found a good article on this and made a proof of concept that works well with simple data sets. However, one underlying assumption of the article is that the inserted data has a natural key AND an Identity.
CREATE TABLE dbo.tblParent (pkParentId bigint IDENTITY(1, 1), field1 int)
CREATE TABLE dbo.tblChild(pkChildId bigint IDENTITY(1, 1), fkParentId bigint, theVal varchar(3))
DECLARE @tbl TABLE (pkSourceId int IDENTITY(1,1), field1 int, childval1 varchar(3), childval2 varchar(3), InsertedId bigint)
INSERT INTO @tbl (field1, childval1, childval2)
VALUES (10,'123', 'ABC')
INSERT INTO @tbl (field1, childval1, childval2)
VALUES (20,'234', 'DEF')
INSERT INTO @tbl (field1, childval1, childval2)
VALUES (30,'345', 'GHI')
INSERT INTO @tbl (field1, childval1, childval2)
VALUES (40,'456', 'JKL')
DECLARE @InsertedRows TABLE(pkParentId bigint, pkSourceId int IDENTITY(1,1))
INSERT INTO tblParent(field1)
OUTPUT inserted.pkParentId
INTO @InsertedRows
SELECT field1
FROM @tbl tmp
ORDER BY tmp.pkSourceId --????????Does this make the following UPDATE correct?????????????????????
UPDATE tmp SET InsertedId = i.pkParentId
FROM @tbl tmp INNER JOIN @InsertedRows i ON tmp.pkSourceId = i.pkSourceId
INSERT INTO tblChild (fkParentId, theVal)
SELECT t.InsertedId, t.childVal1
FROM @tbl t
INSERT INTO tblChild (fkParentId, theVal)
SELECT t.InsertedId, t.childVal2
FROM @tbl t
SELECT * FROM tblParent
SELECT * FROM tblChild
In the real world version of this, I am entering financial transactions and detail lines whose unique identifier won't be known until after the insert and some triggers fire. Essentially, there's no natural key but the pkParentId. Do you think the highlighted ORDER BY is sufficient to guarantee that the order of the INSERTS will be the same as the order of entries in my temporary table "@tbl". Alternately, I have one field on tblParent I can temporarily "fake out" by putting in newId field and then updating it in a second pass, to the real value I want. Thank you very much for your help.
September 12, 2010 at 1:48 pm
Jason Akin (9/11/2010)
Do you think the highlighted ORDER BY is sufficient to guarantee that the order of the INSERTS will be the same as the order of entries in my temporary table "@tbl".
Possibly, if you use OPTION (MAXDOP 1) to prevent parallelism - otherwise parallel threads can be spawned, and then all bets are off.
However, I would not trust this. You truly do not have a natural key for this data?
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
September 12, 2010 at 3:15 pm
Unfortunately, no. It's financial transactions that are basically a race to the database. They have timestamps, but really just the Identity field is the only way to uniquely identify them. We have a few other fields that could be used as a composite key, but just like the primary key insertion, I won't know the unique values until the INSERT is complete and the triggers fire to generate the composite key. What do you think about the idea of temporarily stashing a NewId() in one of the varchar fields, doing my Child INSERTs and then updating the varchar on Parent? It's inelegant, but I'd bet good money it's cheaper than the cursor and several thousand INSERTs.
September 12, 2010 at 4:32 pm
The problem is that in order to really use the output clause to capture the identity column, and then subsequently use it in a child insert, you do need something to be able to tie it back to.
You mentioned NewID()... what's the chances that you have a rowguid column in the tblParent table?
I do like the idea of using the NewID()... I would do it like this:
DECLARE @tbl TABLE (
pkSourceId int IDENTITY(1,1),
field1 int,
childval1 varchar(3),
childval2 varchar(3),
InsertedId bigint,
RowID uniqueidentifier NOT NULL UNIQUE DEFAULT NewID());
INSERT INTO @tbl (field1, childval1, childval2)
VALUES (10,'123', 'ABC')
INSERT INTO @tbl (field1, childval1, childval2)
VALUES (20,'234', 'DEF')
INSERT INTO @tbl (field1, childval1, childval2)
VALUES (30,'345', 'GHI')
INSERT INTO @tbl (field1, childval1, childval2)
VALUES (40,'456', 'JKL')
select * from @tbl
Though I'm sure someone else will come along with a better idea... (is someone out there listening???)
Another thought... is all the data in a row unique? If so, you could output and match on all of the columns.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply