update statement optimization?

Question

update statement optimization?

sqlguy-736318

Hall of Fame

Points: 3859
More actions
May 16, 2013 at 2:25 pm

#272810

Hello - I have a table with 6 million rows that I need to update.
These rows will be updated based on a join to a temp table that has the needed update values.
This will be a "real-time" update so the record updates need to be done in chunks to avoid locking.
Someone I know with a deeper SQL background than myself suggested the following structure for performing the update:
WHILE @@ROWCOUNT > 0
BEGIN
update TOP(5000) ceb
set ceb.new_WCDSCreatedOn = i.CreateDate
OUTPUT inserted.New_IndividualId
INTO @Updates
from contactextensionbase ceb
join #tmpIndividualCreateDates i on ceb.New_IndividualId = i.IndividualID
WHERE NOT EXISTS (
SELECT *
FROM @Updates AS Updates
WHERE Updates.New_IndividualId = ceb.New_IndividualId
)
END
I think this SQL is fairly elegant but it doesn't seem to work the way we expect. For example, I executed the SQL and it ran for 2.5 hours without completing. It probably shouldn't take that long to update 6M rows.
So have you executed an update statement like this before? Does this structure look correct or can you identify any flaws that seem to stand out?

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Probably the table variable. Up to 6 million rows in an unindexed table variable without stats = bad idea. Try a temp table or permanent table with a useful index.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ScottPletcher SSC Guru Points: 101130 More actions · Answer 2

First, you need an index on ceb.New_IndividualId: I'll assume you have that already.

Second, I'd cluster the #tmpIndividualCreateDates table by i.IndividualID: I'll assume you've done that already.

I guess the OUTPUT clause to INSERT the rows UPDATEd into a new table and then do a NOT EXISTS against them is to prevent re-work ... but in this case, I suspect that's more overhead than the re-work itself would be anyway. I think we need to cut all that overhead.

Besides, if we process only forward thru the temp table, we shouldn't have that issue, barring errors / restarts.

At any rate, I suggest something like this:

DECLARE @IndividualID_start int

DECLARE @IndividualID_end int

DECLARE @update_count int

SELECT @IndividualID_start = -1

SET @update_count = 1

WHILE @update_count > 0

BEGIN

-- find the 5000th IndividualID value past the last one processed

SELECT TOP (5000) @IndividualID_end = IndividualID

FROM #tmpIndividualCreateDates

WHERE

IndividualID > @IndividualID_start

ORDER BY

IndividualID

UPDATE ceb

SET

ceb.new_WCDSCreatedOn = i.CreateDate

FROM dbo.contactextensionbase ceb

INNER JOIN #tmpIndividualCreateDates i ON

ceb.New_IndividualId = i.IndividualID AND

ceb.New_IndividualId > @IndividualID_start AND

ceb.New_IndividualId <= @IndividualID_end AND

i.IndividualID > @IndividualID_start AND

i.IndividualID <= @IndividualID_end

SET @update_count = @@ROWCOUNT

SET @IndividualID_start = @IndividualID_end

END

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

lgegerton Old Hand Points: 316 More actions · Answer 3

DECLARE @Blocksize AS BIGINT = 5000,

@pagenum AS BIGINT = 0,

@ROWCNT AS BIGINT = 1;

While @ROWCNT > 0

BEGIN

Update ceb set CEB.new_WCDSCreatedOn = D.CreateDate

from contactextensionbase ceb join

(SELECT IndividualID,CreateDate

from #tmpIndividualCreateDates

order by IndividualID

OFFSET @Blocksize * @pagenum ROWS FETCH next @Blocksize ROWS ONLY) d

on ceb.New_IndividualId = d.IndividualID

set @ROWCNT = @@ROWCOUNT

SET @pagenum = @pagenum + 1

END

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

lgegerton (5/16/2013)
DECLARE @Blocksize AS BIGINT = 5000,
@pagenum AS BIGINT = 0,
@ROWCNT AS BIGINT = 1;
While @ROWCNT > 0
BEGIN
Update ceb set CEB.new_WCDSCreatedOn = D.CreateDate
from contactextensionbase ceb join
(SELECT IndividualID,CreateDate
from #tmpIndividualCreateDates
order by IndividualID
OFFSET @Blocksize * @pagenum ROWS FETCH next @Blocksize ROWS ONLY) d
on ceb.New_IndividualId = d.IndividualID
set @ROWCNT = @@ROWCOUNT
SET @pagenum = @pagenum + 1
END

This won't work with SQL Server 2008.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden