November 12, 2015 at 2:43 pm
was advised to use rowcount when performing mass updates against massive table. So, tested the idea: I ran two queries, one using straightforward UPDATE, and a second using ROWCOUNT 1000 to update the values in the Unitcost column of AdventureWorksDW.dboFactProductInventory table from 0.32 to 0.33. Table size: 776286 rows.
I am on my local computer and the databases are newly installed and no one else is logged on to the machine. So when I run either query there is no difference to size of tempdb according to perfmon.
I found the 'batch' approach to takes exponentially much more time (5 minutes) and cost more according to % processor time (as viewed in perfmon). The only thing that seemed better was IO and operator cost.
Here's what I compared:
----UPDATE took less than 1 second
----IO cost 25.9897
----OPERATOR COST 26.76603
UPDATE [dbo].[FactProductInventory]
SET [UnitCost] = 0.33
----ROWCOUNT METHOD took 5 minutes 21 seconds
----IO COST 2.85120
----OPERATOR COST 3.70534
SET ROWCOUNT 1000
WHILE (1=1) BEGIN
BEGIN TRANSACTION
UPDATE TOP(1000) [dbo].[FactProductInventory]
SET [UnitCost] = 0.32
where UnitCost = 0.33
-- Update 1000 nonupdated rows
IF @@ROWCOUNT = 0
BEGIN
COMMIT TRANSACTION
BREAK
END
COMMIT TRANSACTION
END
Will someone clarify for me if IO cost is the key reason that using the batch approach is better? Because straight update was practically immediate I don't see the benefit of batch update which has a predicate. What other metrics should I be looking at in the execution plan when deciding if the Rowcount batch update was better than straight update and how would I view them?
--Quote me
November 12, 2015 at 3:47 pm
batch updating is kinder to other, concurrent activities. If a bulk update takes 5 minutes, that's 5 min that other queries might wait to do their thing
Gerald Britton, Pluralsight courses
November 12, 2015 at 4:47 pm
Yeah, to add to what g.britton said, some of the ideas with batching are:
1) Holding locks for a shorter duration
2) Better control over log growth; based on whether the DB is using the Full or Simple recovery model, log backups or checkpoints can be run every so often during the process to minimize unwanted log growth. The single transaction from doing it in one shot doesn't give you that luxury.
3) In the case of a rollback, you don't lose all the work, just the work from the most recent batch (if you're batching the updates in separate transactions, I'll assume you've determined that's desirable and the changes don't have to be atomic, i.e., all fail or all succeed together).
The downside in a case like this is that each execution of the UPDATE will require a search to find rows to update. If that search is a table scan, then it's going to require more and more reads each time to find a new 1000 rows to UPDATE, since those rows will be less and less of the table. Even with TOP allowing the scan to stop once it's found 1000 rows, those 777 scans (in this case, based on total rows and batch size) are going to easily be much more expensive altogether than the single scan of the whole table for the one-shot UPDATE.
With a larger batch size you'll probably get a better balance of the overhead of the multiple scans and the benefits of batching. Play around with the batch size and compare the results, maybe starting with something like 50000.
Cheers!
November 15, 2015 at 1:25 pm
Thanks for the replies. I appreciate the advice regarding tuning the size of batches. Thanks.
--Quote me
November 15, 2015 at 2:00 pm
Also note that your tests are not actually equivalent
The first one updates all rows to a particular value, not, as you indicated "update the values in the Unitcost column of AdventureWorksDW.dboFactProductInventory table from 0.32 to 0.33"
It updated the values from whatever they were, to 0.33
UPDATE [dbo].[FactProductInventory]
SET [UnitCost] = 0.33
-- no where clause here
whereas the batched update does have a where clause
750k rows is not a particularly large table. Batch updates/deletes tend to be used with large tables, to reduce the chances of lock escalation, control the log growth, reduce the impact on other processes, etc.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 15, 2015 at 2:15 pm
1. Oh no, I mentioned that :-).
2. I was still surprised that in such a small table the batches took over 5 minutes to complete.
3. the affect to % processor use was then extended to whole 5 minutes of batch processing
4. I am understanding each of you to be saying that the extended time it takes to process in batches may be worth it because it doesn't lock the table from being used for other transactions. I have been in a situation though where the maintenance window for updating tables was narrow and the straight up update (no predicate) would be better because it's faster.
--Quote me
November 15, 2015 at 4:13 pm
polkadot (11/15/2015)
I have been in a situation though where the maintenance window for updating tables was narrow and the straight up update (no predicate) would be better because it's faster.
Now try with zero maintenance window, lots and lots of concurrent reads and a table a few hundred times larger.
When doing updates or deletes by batch, wrapping the entire thing in a transaction is counter-productive. Doing so means it takes longer (because of the batches) and still locks and logs like it was a single update. So worst of both worlds. Plus there's no error handling in your transaction, so it'll commit entirely anyway.
The table's too small to show the impact of update/delete in a huge table. I suspect, being in a DW, it also doesn't have the number of nonclustered indexes usual in a transactional system (which is usually where this kind of thing is needed), which means less impact from the full-table update
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply