Insert performance

  • I am writing script to import a clients entire local db into our onsite db. This consists of 50+ tables. Most tables take seconds to a couple minutes to import. But in this one ugly case it is taking hours to import 172,000 records. We need to take down the production server in order to run this, so obviously this is not good.

    What complicates the scenario is the need to keep a mapping of old ids to new ids. Originally when I saw how slow this was, I did a bulk insert instead. This worked and was fast, but the primary key is IDENTITY and reliably mapping the old/new ids after the insert wasn't working.

    Does anyone have any input on how to make this faster, or go back to using the bulk insert with a reliable way to map the old/new ids afterwards?

    DECLARE @ctr int, @recCount int, @strHospIdLow varchar(20), @strHospIdHigh varchar(20)

    SET @strHospIdLow = '0200'

    SET @strHospIdHigh = '0250'

    CREATE TABLE dbo.MapIds_Audit

    (recNum int IDENTITY, idOld varchar(32), idNew varchar(32))

    INSERT INTO MapIds_Audit (idOld)

    SELECT Audit_SqlID

    FROM dbOld.dbo.Audit A

    JOIN dbOld.dbo.AuditSession AuS ON A.AuditSession_SqlId = AuS.AuditSession_SqlId

    WHERE AuS.Hosp_id BETWEEN @strHospIdLow AND @strHospIdHigh AND LEN(AuS.Hosp_id)=4

    SET @ctr = 1

    SET @recCount = (SELECT MAX(recNum) FROM MapIds_Audit)

    WHILE @ctr <= @recCount
    BEGIN
    INSERT INTO dbNew.dbo.Audit
    (AuditSession_SqlId,fieldA,fieldB,fieldC,fieldD,fieldE,fieldF,fieldG)
    SELECT
    SESS.idNew
    ,fieldA
    ,fieldB
    ,fieldC
    ,fieldD
    ,fieldE
    ,CASE A.Section
    WHEN 'C' THEN C.idNew
    WHEN 'AR' THEN AR.idNew
    WHEN 'A' THEN M.idNew
    ELSE ''
    END
    ,fieldG
    FROM dbOld.dbo.Audit AS A
    JOIN MapIds_Audit AS I ON A.Audit_SqlID = I.idOld
    JOIN MapIds_AuditSession AS SESS ON A.AuditSession_SqlId = SESS.idOld
    LEFT OUTER JOIN MapIds_MReport M ON A.Report_ID = M.idOld
    LEFT OUTER JOIN MapIds_CReport C ON A.Report_ID = C.idOld
    LEFT OUTER JOIN MapIds_AReport AR ON A.Report_ID = AR.idOld
    WHERE I.recNum = @ctr

    UPDATE MapIds_Audit SET idNew = SCOPE_IDENTITY() WHERE recnum = @ctr
    SET @ctr = @ctr + 1
    END

  • Looping and inserting 1 record at a time is never going to give you the performance you require.

    I would revert to bulk insert to a staging table, and then remap the IDs via a set-based operation. What was unreliable about mapping the old/new ID's using the bulk insert method ?

     

  • In some cases multiple records have duplicate data (even timestamp) so I couldn't figure out how to match the new inserted recs back to the old table.

    I will put some thought into the staging table...

  • Well I got it down to just over 2 minutes by inserting a column into the new table to store the old id, then using that after the bulk insert to update my mapping table (then dropping the new column). This is great if "the powers that be" will let me manipulate the table structure in this way...

  • Why don't you just set identity_insert on, insert the records, set identity_insert off, and then do a DBCC check_identity to set the next identity to a number higher than you inserted?

    Hope this helps

  • Do you mean insert the ids into the new table directly from the old table? If so, this won't work because those ids are used already.

    Or do you mean generate ids starting from the next consecutive?

  • Thanks for the tip Grasshopper, messing with SET IDENTITY_INSERT is yet even faster

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply