How can I commit more frequently?

Question

How can I commit more frequently?

shahgols

SSCrazy Eights

Points: 9874
More actions
March 15, 2004 at 11:50 am

#65116

Hi everyone, my question is, if I have an update statement that updates 30+ million rows, is there a way to make this update commit every few thousand rows that it updates? Thanks in advance.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Sopheap Suy SSCertifiable Points: 7893 More actions · Answer 1

Put your update statement in a loop, update only those that fit the criteria, increment your criteria, loop, and then update again.

mom

chrisleonard SSCertifiable Points: 5041 More actions · Answer 2

I think that using a loop (especially to iterate through a cursor) might be a good idea here, so I'd agree with mom ... BUT ... you don't want to commit every loop iteration if you're retrieving one row per iteration. That would kill your performance (I could go into painful detail here, but it works better on a whiteboard). I'm pretty sure that mom means the technique where your where clause allows the update of a bunch of rows every loop iteration, so that each commit also handles a bunch of rows. But I thought it would be worth mentioning explicitly, because (ouch!) issuing a commit on each row update can just destroy your throughput.

Cheers,

Chris

wz700 SSCertifiable Points: 7133 More actions · Answer 3

The technical I often use to avoid log grow too large is using set rowcount. It control exactly how many rows to update.

chrisleonard SSCertifiable Points: 5041 More actions · Answer 4

That's a good technique, but it's easiest to use set rowcount on deletes (because you don't have to adjust the SQL statement - just run it repeatedly in a loop until you run out of rows to delete). Remember, Shahgol needs to do updates, and so will need something a little more sophisticated than a "vanilla" set rowcount loop. I assume this is what mom was referring to also - in this case the where clause of the update will probably have to be dynamically modified with each iteration to produce a windowed effect so that every row is updated at least once. Either that, or updating through a cursor, counting the number of iterations, and doing a commit / begin transaction every Nth iteration, with one more cleanup commit at the end.

The "windowed where clause" approach can be more efficient - sometimes - but the "cursor with commits every Nth row" approach is general and can always be used (sometimes the "windowed where clause" can be too complex for mere mortals to implement when the where clause has to be more complex than just identifying a range of integer primary key values).

Cheers,

Chris

RonaldKatalin SSC Enthusiast Points: 184 More actions · Answer 5

Maybe you can use the Update in combination with an SELECT top 1000

UPDATE

Sopheap Suy SSCertifiable Points: 7893 More actions · Answer 6

here is what I mean:

declare @a int, @b-2 int

set @a=0

set @b-2 = select count(*) from tablename

while @a <= @b-2

update table where key >= @a

and key < @a+ 1000

set @a = @a+1000 --commit every 1000 row

loop

We could commite the row every row or every so many thousand row depending on how you want to set it up.

mom

Peter Evans Ten Centuries Points: 1187 More actions · Answer 7

The hard thing about this problem is tracking the work you've done and

the work remaining. Ideally you want to be able to track a logical transaction against the large table. I think shagols probably wants to be able to commit 1000 updated rows not what is updated every 1000 rows inspected.

I think one good approach is to snapshot chunks of the desired

table updates into a temporary table using the top function and then

update the table by joining it with the temporary table. This batches the updates size to your preferred 1000 rows at a time and allows for scanning large chunks of the update table under the update constraints on each loop pass.

This does require an amenable pk and of course and an correct

ordered select query plan by pk under the joins for the calculation of new column values so it may cause overhead if the clustering isn't on the pk.

Also I would'nt recommend this if the pks are some how mutating either.

Another thing to think about is avoiding second instances of such an update. So youll probably want some form of incremental updates semaphore to avoid a second instance of the same update running against the table concurrently.

Here's a brief outline of what I mean by the snapshot chunking and updates.

Any comments on this approach would be welcomed.

Peter Evasn (__PETER Peter_)

-- Using the status column in the table approach.

declare @runtimeid BigInt

declare @ROWSUPDATED BigInt

-- Get semaphore

update incrementalupdatejobids

set jobrunning = runtimeid + 1

, runtimeid = runtimeid + 1

from incrementalupdatejobids

where name = "table"

and updatetaskname = "updatetaskname"

and jobrunning is null

select @ROWSUPDATED = @@ROWCOUNT

if (@ROWSUPDATED > 0) -- Good update task specific record

begin

-- Set Iterator range values to track job.

select @runtimeid = runtimeid

from incrementalupdatejobids

where name = "table"

and updatetaskname = "updatetaskname"

and jobrunning is not null

insert

(runtimeid, KEYBOUNDARY)

select @runtimeid, NULL

-- Create TABLE [#rowids w updated values]

-- Loop over range

while (@ROWSUPDATED > 0)

begin

-- Get a calculated snapshot of the first 1000 rows to process.

insert into [#rowids w updated values]

TOP 1000 .[ROWID]

, (desired updated value expression) as COL1

, (desired updated value expression) as COL2

...

, (desired updated value expression) as COLN

from

-- OTHER CONSTRAINT TABLES JOIN GO HERE

-- NOTE MOVE THE [ROWID] constraints into the first join clause

-- to indicate precedence for [ROWID] Filtering

-- NOTE IF (TRUE OR UNKNOWN) IS TRUE

where (

(.[ROWID] > (

select last

from

where runtimeid = @runtimeid))

or

(.[ROWID] >= (

select MIN(.[ROWID])

from

WHERE NOT EXISTS (

select * from

where runtimeid = @runtimeid

)

AND

-- ... NORMAL UPDATE CONSTRAINTS GO HERE

order by .[ROWID]

-- Update the tables values

update

set [updated column 1] = [#rowids w updated values].[COL1]

, [updated column 2] = [#rowids w updated values].[COL2]

, .

FROM

INNER JOIN [#rowids w updated values]

on .[ROWID] = [#rowids w updated values].[ROWID]

select @ROWSUPDATED = @@ROWCOUNT

update

set Last = select MAX(.[ROWID])

delete [#rowids to process plus updated values]

end

-- remove job as running from incremental updates tracking table

update

incrementalupdatejobids

set

jobrunning = NULL

from incrementalupdatejobids

where name = "target table name"

and updatetaskname = "arbitrary update task name"

and jobrunning is NOT null

end

chrisleonard SSCertifiable Points: 5041 More actions · Answer 8

The most interesting thing about this thread - to me - is that all of us who have posted probably understand one or more of the common ways to do this, but the thread originator - shahgols - hasn't responded yet. Of course, it's only been about a day and a half, but since the original request was not very specific, it would be interesting to know more specifics about his or her very solvable problem. In the meantime ... nice talking with the rest of you, I guess!