May 13, 2013 at 9:23 pm
Background:
I've added a new column to a table with some 800 million rows. I need the column populated with data from another table in the db. I do this in batches using a where clause between two boundary values. I do this in a loop incrementing the boundary values by a batch size. I do this to keep the batch size within managable levels (preventing paging etc).
I set the batch size to 100,000 and each loop was taking 4 minutes. At that rate, it would take almost 4 days to complete. So I started to investigate, first up, take the query and look at the execution plan. So I took the guts of the query, replace the variables with scalar constants and ran it, it takes under a second to run - wierd - ok maybe it was cached, so I run it over a record batch I know I haven't updated yet, same result.
I run the loop again, and look at the execution plan and surprise, they're different. Update statistics, no change. Can someone please tell me why this is occuring?
UPDATE TranItem
SET [TransactionDate] = [Transaction].[TransTime]
FROM TranItem
INNER JOIN [Transaction] on [TranItem].TransactionId = [Transaction].TransactionId
WHERE TranItemID between 671726 and 771726
Yeilds the first execution plan - 2013-05-14 13_09_15.png
DECLARE @BatchSize int
DECLARE @IDStart int
DECLARE @IDEnd int
SET @IDStart = 1
SET @IDEnd = 900000000
SET @BatchSize = 100000
While @IDStart < @IDEnd BEGIN
UPDATE TranItem
SET [TransactionDate] = [Transaction].[TransTime]
FROM TranItem
INNER JOIN [Transaction] on [TranItem].TransactionId = [Transaction].TransactionId
WHERE TranItemID between @IDStart and (@IDStart + @BatchSize)
set @IDStart = @IDStart + @BatchSize
END
GO
Yeilds the second execution plan : 2013-05-14 13_21_12.png
Could someone shed some light on this seemingly weird behaviour?
Cheers.
May 14, 2013 at 12:49 pm
Sounds like parameter sniffing and in this case it might be what Grant Fritchey calls bad parameter sniffing.
In the plan with the parameters right-click on the Update operator and select properties. Then look at the Parameter List which will show the compiled value and the RunTime value for each parameter. The compiled value is the sniffed value.
You might get better performance by using the OPTION(RECOMPILE) hint.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
May 14, 2013 at 4:13 pm
Thanks Jack,
I couldn't see the parameter list in the Update node properties, but the Option (RECOMPILE) did the trick.
Thanks for the cue, I'll read up on bad parameter sniffing.
May 14, 2013 at 6:20 pm
I notice you are using BETWEEN.
BETWEEN has a weak inequality (>= and <=) at both the upper and lower bound of the range.
This way, as you traverse the ranges, you will process twice values that are equal to either the upper or lower bound of each range.
Ideally you should use a strong inequality on one bound of the range and a strong inequality on the other.
I try to avoid using BETWEEN because it does not explicitly define the type of inequality.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply