February 18, 2013 at 3:45 am
Guys,
I'm wondering of the implications of two different update methods, say you're changing a bunch of titles from Mr to Mrs (or something similar), for say 300 records.
1. 'Batch'
UPDATE Contact
SET Title = 'Mrs'
WHERE ConID in
(
1,2,3,4..n up to 300 or so.
)
2. Row by row
UPDATE Contact SET Title = 'Mrs' WHERE ConID = 1
UPDATE Contact SET Title = 'Mrs' WHERE ConID = 2
UPDATE Contact SET Title = 'Mrs' WHERE ConID = 3
UPDATE Contact SET Title = 'Mrs' WHERE ConID = n... up to 300 or so
I've tried both of these with an actual example, the row by row tool 10 seconds vs the batch taking just 1 second, clearly that's the 'quicker' option.
What are the other implications for these two methods which ultimately achieve the same outcome?
If you were using the (slower) row by row approach on live data which could at any moment be updated via a user would this be a safer approach? - The one update statement would fail? - Under the batch it could run for ages.
Just curious, I've not thought about it that much but a deadlock on live data via the batch method last week made me wonder this morning if the row by row would have been better.
February 18, 2013 at 5:48 am
Batch - faster, less resources, consistency at a statement level
Row by row - slow, more locks, no automatic consistency, you need to use transactions (if one update fails, the rest will succeed and you'll have to test and check for that)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 18, 2013 at 5:52 am
Thanks for the response. In the situation I faced last week where I was updating contacts, about 2,000, on a local version of the database this took 1 second, on the live version, it completed in just over 1 min, causing a deadlock and timeouts for a few people!
I was curious as to whether the row by row would have been better because (I presume the problem was cause becasuse someone was trying to update a contact I had in my batc) as it would have looked at a contact one at a time?
Not sure if I'm explaining myself that well, hopefully the above makes sense.
February 18, 2013 at 6:08 am
Row by row better? Well, in the sense that if one fails the whole batch won't fail, but then you have no guarantee that all the contacts were updated.
A single-row update can still cause timeouts and deadlocks.
Personally that kind of update I do out of business hours.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 18, 2013 at 6:15 am
That was my decision in the end to be honest, out of business hours. Worked out okay for this situation. I guess I was just thinking if one was foced to do it in business hours if this would be better or not.
I suppose one could run a simple select on the universe being updated to verify they had been updated, still, the batch/set approach would be better I guess.
February 18, 2013 at 7:44 am
Rob-350472 (2/18/2013)
That was my decision in the end to be honest, out of business hours. Worked out okay for this situation. I guess I was just thinking if one was foced to do it in business hours if this would be better or not.I suppose one could run a simple select on the universe being updated to verify they had been updated, still, the batch/set approach would be better I guess.
For large tables, I usually do a hybrid of both...basically a while loop with an update of small batches ( It requires more code and 1 or 2 staging tables to help keep track of the records and the completed batches in case you have to stop the process). But you want to do everything in your power to avoid doing 1 row per transaction. Not only will it take forever, but it will cause rampage in your transaction log. Even if you only did 5 or 20 rows per transaction, the amount of time and transaction log IO you use is cut down is pretty significant compared to only doing 1 row at time.
But for a table that is being used during business hours, the primary goal is to make sure each transaction still happens pretty instantaneous, so of course there is a limit to how many # of rows you can include in each transaction.
February 18, 2013 at 10:26 am
Gabriel, that makes quite a lot of sense, I guess I could have broken my say 2,000 into blocks of 500 perhaps, however, in that instance out hours makes more sense as it's not mission critical to be done instantly.
February 20, 2013 at 2:17 am
Rob-350472 (2/18/2013)
I'm wondering of the implications of two different update methods, say you're changing a bunch of titles from Mr to Mrs (or something similar), for say 300 records.
The batch approach is certainly more efficient for the update operation; that side of things has been well-covered in previous comments.
A couple of things in your opening post look odd to me though:
1. Both batch and row-by-row queries feature hard-coded numeric values, which are presumably keys for the table concerned. It is much more common to use a dynamic selection query, or to use keys from a table (usually a temporary one). Do the queries concerned really look like this?
2. Neither query checks that the condition that caused them to be selected for change, still applies. Your example is a change from 'Mr' to 'Mrs' - in that context, the update queries ought to check that the source record still contains 'Mr', otherwise you risk a lost update (where someone else has changed the data in the meantime, their change is lost).
As far as choosing between the two approaches is concerned, it all depends on your objectives. Extremely busy OLTP systems generally opt for a very defensive approach. In practice, this often means row-by-row changes with a small delay between each. Combined with other defensive measures like SET DEADLOCK_PRIORITY LOW, and setting LOCK_TIMEOUT or using the NOWAIT table hint, this helps to ensure changes are applied at a rate the server can tolerate, with the update activity taking a back seat to regular database activity. This is often much more important than the efficiency of the updates.
On to specifics. If there is a useful index on the Title column, I might write the *batch update* something like:
SET DEADLOCK_PRIORITY LOW;
SET LOCK_TIMEOUT 0;
WHILE 1 = 1
BEGIN
UPDATE TOP (300) dbo.Contact
SET Title = 'Mrs'
WHERE Title = 'Mr';
IF @@ROWCOUNT = 0 BREAK;
END;
If there is no useful index on Title, something like:
CREATE TABLE #Selected (pk integer PRIMARY KEY);
CREATE TABLE #Batch (pk integer PRIMARY KEY);
-- All rows to be processed at some point
INSERT #Selected (pk)
SELECT c.pk
FROM dbo.Contact AS c
WHERE c.title = N'Mr';
WHILE 1 = 1
BEGIN
-- Move up to 300 keys from #Selected to #Batch
INSERT #Batch (pk)
SELECT s.pk
FROM
(
DELETE TOP (300) #Selected
OUTPUT DELETED.pk
) AS s;
-- Finished if no rows in the current batch
IF @@ROWCOUNT = 0 BREAK;
-- Do the batch update, checking the selection condition still applies
UPDATE dbo.Contact
SET title = N'Mrs'
WHERE pk IN (SELECT b.pk FROM #Batch AS b)
AND title = N'Mr';
-- Clear the batch table
TRUNCATE TABLE #Batch;
END;
You would need to add error handling (for example where we are the victim of a lock time-out or deadlock), but you get the general idea, I'm sure.
Overall, the 'best' solution depends on your system and business requirements. I have worked with clients with such high transaction rates and availability requirements that the only option was to trickle-feed single-row changes with WAITFOR DELAY '00:00:00.010'; between each.
February 22, 2013 at 7:53 am
Paul, thanks for the reply, delayed response but to answer your two questions:
1. Both batch and row-by-row queries feature hard-coded numeric values, which are presumably keys for the table concerned. It is much more common to use a dynamic selection query, or to use keys from a table (usually a temporary one). Do the queries concerned really look like this?
Yes! The queries actually do look like this, basically someone else will hand me an excel spreadsheet of IDs they've reviewed and want the changes made fo - I'll just concatenate these IDs in Excel and run the update.
2. Neither query checks that the condition that caused them to be selected for change, still applies. Your example is a change from 'Mr' to 'Mrs' - in that context, the update queries ought to check that the source record still contains 'Mr', otherwise you risk a lost update (where someone else has changed the data in the meantime, their change is lost).
This is true and the actual queries would do that check, for similicity I omitted that here.
The defensive approach you mention is very interesting and introduces a few ideas I'd either not heard of or not thought of, I don't think we're at the stage that these would be necessary at the moment but it's something I'd like to play with before they do become a necessity, thanks for the information there 🙂
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply