update statement optimization?

  • Hello - I have a table with 6 million rows that I need to update.

    These rows will be updated based on a join to a temp table that has the needed update values.

    This will be a "real-time" update so the record updates need to be done in chunks to avoid locking.

    Someone I know with a deeper SQL background than myself suggested the following structure for performing the update:

    WHILE @@ROWCOUNT > 0

    BEGIN

    update TOP(5000) ceb

    set ceb.new_WCDSCreatedOn = i.CreateDate

    OUTPUT inserted.New_IndividualId

    INTO @Updates

    from contactextensionbase ceb

    join #tmpIndividualCreateDates i on ceb.New_IndividualId = i.IndividualID

    WHERE NOT EXISTS (

    SELECT *

    FROM @Updates AS Updates

    WHERE Updates.New_IndividualId = ceb.New_IndividualId

    )

    END

    I think this SQL is fairly elegant but it doesn't seem to work the way we expect. For example, I executed the SQL and it ran for 2.5 hours without completing. It probably shouldn't take that long to update 6M rows.

    So have you executed an update statement like this before? Does this structure look correct or can you identify any flaws that seem to stand out?

  • Probably the table variable. Up to 6 million rows in an unindexed table variable without stats = bad idea. Try a temp table or permanent table with a useful index.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • First, you need an index on ceb.New_IndividualId: I'll assume you have that already.

    Second, I'd cluster the #tmpIndividualCreateDates table by i.IndividualID: I'll assume you've done that already.

    I guess the OUTPUT clause to INSERT the rows UPDATEd into a new table and then do a NOT EXISTS against them is to prevent re-work ... but in this case, I suspect that's more overhead than the re-work itself would be anyway. I think we need to cut all that overhead.

    Besides, if we process only forward thru the temp table, we shouldn't have that issue, barring errors / restarts.

    At any rate, I suggest something like this:

    DECLARE @IndividualID_start int

    DECLARE @IndividualID_end int

    DECLARE @update_count int

    SELECT @IndividualID_start = -1

    SET @update_count = 1

    WHILE @update_count > 0

    BEGIN

    -- find the 5000th IndividualID value past the last one processed

    SELECT TOP (5000) @IndividualID_end = IndividualID

    FROM #tmpIndividualCreateDates

    WHERE

    IndividualID > @IndividualID_start

    ORDER BY

    IndividualID

    UPDATE ceb

    SET

    ceb.new_WCDSCreatedOn = i.CreateDate

    FROM dbo.contactextensionbase ceb

    INNER JOIN #tmpIndividualCreateDates i ON

    ceb.New_IndividualId = i.IndividualID AND

    ceb.New_IndividualId > @IndividualID_start AND

    ceb.New_IndividualId <= @IndividualID_end AND

    i.IndividualID > @IndividualID_start AND

    i.IndividualID <= @IndividualID_end

    SET @update_count = @@ROWCOUNT

    SET @IndividualID_start = @IndividualID_end

    END

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • DECLARE @Blocksize AS BIGINT = 5000,

    @pagenum AS BIGINT = 0,

    @ROWCNT AS BIGINT = 1;

    While @ROWCNT > 0

    BEGIN

    Update ceb set CEB.new_WCDSCreatedOn = D.CreateDate

    from contactextensionbase ceb join

    (SELECT IndividualID,CreateDate

    from #tmpIndividualCreateDates

    order by IndividualID

    OFFSET @Blocksize * @pagenum ROWS FETCH next @Blocksize ROWS ONLY) d

    on ceb.New_IndividualId = d.IndividualID

    set @ROWCNT = @@ROWCOUNT

    SET @pagenum = @pagenum + 1

    END

  • lgegerton (5/16/2013)


    DECLARE @Blocksize AS BIGINT = 5000,

    @pagenum AS BIGINT = 0,

    @ROWCNT AS BIGINT = 1;

    While @ROWCNT > 0

    BEGIN

    Update ceb set CEB.new_WCDSCreatedOn = D.CreateDate

    from contactextensionbase ceb join

    (SELECT IndividualID,CreateDate

    from #tmpIndividualCreateDates

    order by IndividualID

    OFFSET @Blocksize * @pagenum ROWS FETCH next @Blocksize ROWS ONLY) d

    on ceb.New_IndividualId = d.IndividualID

    set @ROWCNT = @@ROWCOUNT

    SET @pagenum = @pagenum + 1

    END

    This won't work with SQL Server 2008.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply