Which is Quicker ? - updating 90 million records

Question

Post reply

Which is Quicker ? - updating 90 million records

simon_s

Ten Centuries

Points: 1358
More actions
January 23, 2015 at 8:38 am

#302503

Hi , Ive got a table of 60 million (ish) records I need to update in chunks
Which is the quicker approach (or doesnt it even mater ?)
A higher number of iterations or a higher number of records per update ?
Many thanks for any words of wise wisdom
DECLARE @COUNTER INT
SET @COUNTER = 1
WHILE @COUNTER < 8
BEGIN
BEGIN TRANSACTION
UPDATE TOP (800000) a
SET A.FIELD1 = B.FIELD2++B.FIELD3 , A.UPDATED = 1
FROMTABLEA A
INNER JOIN TABLEB B
ON A.ID = B.ID
AND A.RECNO = B.RECNO
WHERE A.UPDATED IS NULL
SET @COUNTER = @COUNTER + 1
COMMIT TRANSACTION
END

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

spaghettidba SSC Guru Points: 105732 More actions · Answer 1

As you might expect, "it depends" is the most appropriate answer.

Keep in mind that bigger batches require more T-LOG space.

It also depends on how expensive is the query to find the rows to update compared to the update itself. If the INNER JOIN I see in your query costs more than updating the data, bigger batches will likely be faster.

-- Gianluca Sartori

simon_s Ten Centuries Points: 1358 More actions · Answer 2

simon_s

Ten Centuries

Points: 1358

January 23, 2015 at 9:21 am

#1772705

thanks Gianluca much appreciated 🙂

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

I'll say that this would be slower because you're still using a single transaction and you're using several queries. This is a guess and I might be wrong.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Jeff Moden SSC Guru Points: 1004343 More actions · Answer 4

Luis Cazares (1/23/2015)
I'll say that this would be slower because you're still using a single transaction and you're using several queries. This is a guess and I might be wrong.

I also believe that single transaction will also play hell with the log file. The transaction trick is great for nearly doubling the performance of a RBAR While loop but that's not what this one is. It's a "set size control" rather than RBAR.

I'd say that it would be better to just lock the whole table rather than doing this in an explicit transaction if you can afford the luxury of excluding all other users.

You also have to remember that every system has a "tipping point" where, for example, updating 3 million rows might only take 3 times the duration as updating a million but 4 million rows (again, for example) might take an hour or two.

Doing a million rows at a time isn't going to be horribly slower overall than doing 2 million rows if things are done correctly. I also wouldn't use an "Updated" column as the primary search for the update. I'll have to explain what I mean by that later tonight.

In the mean time, what is the definition of the clustered index on this table and what are the datatypes of the columns in that clustered index? I ask because I believe this can be made quite a bit faster but I need to know that information to be sure.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

simon_s Ten Centuries Points: 1358 More actions · Answer 5

Hey Jeff

thanks for the input

as far as I was concerned the updated flag was a poor choice but the only one I had, the updated data isnt visually distinguishable from the data its being replaced with

the PK is a clustered composite on the fields used for the join

regards simon !

Xedni SSCertifiable Points: 6951 More actions · Answer 6

Jeff Moden (1/23/2015)
....I'd say that it would be better to just lock the whole table rather than doing this in an explicit transaction if you can afford the luxury of excluding all other users.

I know this is a bit of a tangent from the original question, but this reminded me of something I do at my company, similar to (or literally) read committed transaction. This way the table before the updates begin remains online while you perform all the locking transactions your heart could desire.

You can also do this manually with a set of views and clones of the same table so you can do work on one and then alter the view to swap out which table you're pointing at when the work (in this case, updates) are finished.

Executive Junior Cowboy Developer, Esq.[/url]

Jeff Moden SSC Guru Points: 1004343 More actions · Answer 7

simon_s (1/23/2015)
Hey Jeff
thanks for the input
as far as I was concerned the updated flag was a poor choice but the only one I had, the updated data isnt visually distinguishable from the data its being replaced with
the PK is a clustered composite on the fields used for the join
regards simon !

In that case (and assuming that you're trying to update all 60 million rows), I have an idea. Before we do a deep dive on the idea, tell me how long the following query takes to run. I'm thinking about 12-15 seconds.

WITH

cteEnumerate AS

(

SELECT RowNum = ROW_NUMBER() OVER (ORDER BY ID, RECNO)

,ID

FROM dbo.TableA

)

SELECT *

FROM cteEnumerate

WHERE RowNum%1000000 = 0

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

simon_s Ten Centuries Points: 1358 More actions · Answer 8

Hey Jeff

takes 4 minutes amd returns 555 rows

thanks simon