September 19, 2007 at 10:58 am
I am writing script to import a clients entire local db into our onsite db. This consists of 50+ tables. Most tables take seconds to a couple minutes to import. But in this one ugly case it is taking hours to import 172,000 records. We need to take down the production server in order to run this, so obviously this is not good.
What complicates the scenario is the need to keep a mapping of old ids to new ids. Originally when I saw how slow this was, I did a bulk insert instead. This worked and was fast, but the primary key is IDENTITY and reliably mapping the old/new ids after the insert wasn't working.
Does anyone have any input on how to make this faster, or go back to using the bulk insert with a reliable way to map the old/new ids afterwards?
DECLARE @ctr int, @recCount int, @strHospIdLow varchar(20), @strHospIdHigh varchar(20)
SET @strHospIdLow = '0200'
SET @strHospIdHigh = '0250'
CREATE TABLE dbo.MapIds_Audit
(recNum int IDENTITY, idOld varchar(32), idNew varchar(32))
INSERT INTO MapIds_Audit (idOld)
SELECT Audit_SqlID
FROM dbOld.dbo.Audit A
JOIN dbOld.dbo.AuditSession AuS ON A.AuditSession_SqlId = AuS.AuditSession_SqlId
WHERE AuS.Hosp_id BETWEEN @strHospIdLow AND @strHospIdHigh AND LEN(AuS.Hosp_id)=4
SET @ctr = 1
SET @recCount = (SELECT MAX(recNum) FROM MapIds_Audit)
WHILE @ctr <= @recCount
BEGIN
INSERT INTO dbNew.dbo.Audit
(AuditSession_SqlId,fieldA,fieldB,fieldC,fieldD,fieldE,fieldF,fieldG)
SELECT
SESS.idNew
,fieldA
,fieldB
,fieldC
,fieldD
,fieldE
,CASE A.Section
WHEN 'C' THEN C.idNew
WHEN 'AR' THEN AR.idNew
WHEN 'A' THEN M.idNew
ELSE ''
END
,fieldG
FROM dbOld.dbo.Audit AS A
JOIN MapIds_Audit AS I ON A.Audit_SqlID = I.idOld
JOIN MapIds_AuditSession AS SESS ON A.AuditSession_SqlId = SESS.idOld
LEFT OUTER JOIN MapIds_MReport M ON A.Report_ID = M.idOld
LEFT OUTER JOIN MapIds_CReport C ON A.Report_ID = C.idOld
LEFT OUTER JOIN MapIds_AReport AR ON A.Report_ID = AR.idOld
WHERE I.recNum = @ctr
UPDATE MapIds_Audit SET idNew = SCOPE_IDENTITY() WHERE recnum = @ctr
SET @ctr = @ctr + 1
END
September 19, 2007 at 11:18 am
Looping and inserting 1 record at a time is never going to give you the performance you require.
I would revert to bulk insert to a staging table, and then remap the IDs via a set-based operation. What was unreliable about mapping the old/new ID's using the bulk insert method ?
September 19, 2007 at 11:47 am
In some cases multiple records have duplicate data (even timestamp) so I couldn't figure out how to match the new inserted recs back to the old table.
I will put some thought into the staging table...
September 19, 2007 at 12:13 pm
Well I got it down to just over 2 minutes by inserting a column into the new table to store the old id, then using that after the bulk insert to update my mapping table (then dropping the new column). This is great if "the powers that be" will let me manipulate the table structure in this way...
September 19, 2007 at 12:47 pm
Why don't you just set identity_insert on, insert the records, set identity_insert off, and then do a DBCC check_identity to set the next identity to a number higher than you inserted?
Hope this helps
September 19, 2007 at 12:59 pm
Do you mean insert the ids into the new table directly from the old table? If so, this won't work because those ids are used already.
Or do you mean generate ids starting from the next consecutive?
September 19, 2007 at 3:02 pm
Thanks for the tip Grasshopper, messing with SET IDENTITY_INSERT is yet even faster
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply