Improving the Performance of UPDATE Statements in SQL Server

  • Comments posted to this topic are about the item Improving the Performance of UPDATE Statements in SQL Server

  • you can speed your WHILE loops up, when you simply do

    WHILE 1 = 1 -- endless loop
    BEGIN
    UPDATE TOP (@batchsize) ... WHERE ...

    IF @@ROWCOUNT < @batchsize BREAK; -- leave the loop
    END

    This code is a bit simplier than your original WHILE and it prevents the very last loop where zero rows where updated (because you are already done). And even an empty loop may take several minutes on very large tables, particularly when there is no fitting index that helps to decide which row needs to be updated.

    God is real, unless declared integer.

  • I think this is an AI generated article that contains lots of failures posted by a person with no real knowledge that was not even willing to test the results by themself.

    It makes absolute no sense, that the total execution time of a loop is 14 seconds while it is 15 minutes for the bulk update. And that the read page count is equal, regardless, if your batch size is 1000 or 5000. This would just make sense, when you compare the time / IO of a single batch loop instead of the total over all necessary executions.

    Since there is no fitting index, the SQL server needs to do a full table scan for each loop, so that the number of pages read should be five times higher in the batchsize = 1000 scenario compared to the 5000 scenario.

    PPS: I hope the "author" will not get any money for this article to discourage further people from posting low quality stuff.

    God is real, unless declared integer.

  • Thanks for your comments. I understand your points and I would like to clarify a couple of things.

    Regarding the execution time difference, batch updates are faster because they reduce locking and system contention by processing records in smaller chunks. Bulk updates process everything at once which can put more strain on the system and take longer. I ran these tests myself in SSMS and the results were the same as what i posted in my article.

    As for the read counts I see why that might seem weird. I think SQL Server is optimizing things behind the scenes using caching or internal processing but you are right it’s worth digging into. I will look into it and add more to the article.

    You are absolutely right about indexing if the table is not properly indexed performance will suffer that’s a great point and i will make sure to mention it in future articles.

    Thanks for the feedback it will help me make the article better.

  • It might be a good idea to SET NOCOUNT ON.

    I don't know how many resources that takes, but I don't need to see 10,000 messages.

  • There is a lot to unpack here, but I'd just like to add the importance of indexing the predicate used in an UPDATE or DELETE statement.

    No mention of indexing was made in the article.

    " ...
    WHERE col1 NOT LIKE 'Updated%'; "

     

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I agree. This seems like an AI generated article. and unfortunately made it through.

  • Noman072 wrote:

    Thanks for your comments. I understand your points and I would like to clarify a couple of things.

    Regarding the execution time difference, batch updates are faster because they reduce locking and system contention by processing records in smaller chunks. Bulk updates process everything at once which can put more strain on the system and take longer. I ran these tests myself in SSMS and the results were the same as what i posted in my article.

    As for the read counts I see why that might seem weird. I think SQL Server is optimizing things behind the scenes using caching or internal processing but you are right it’s worth digging into. I will look into it and add more to the article.

    You are absolutely right about indexing if the table is not properly indexed performance will suffer that’s a great point and i will make sure to mention it in future articles.

    Thanks for the feedback it will help me make the article better.

    I appreciate anyone that steps up to the plate to share knowledge about SQL  but there's something seriously wrong with your testing.  First, I agree with Tomas Franz about the errors.  The GO 1000000; fails because of the semi-colon.

    Second, it's a bit ironic that someone would use such a RBAR method to populate a test table in an article that's about performance.  There are at least 3 other ways to do the same thing in about 20 seconds on a physical hard disk and about 3 seconds on SSDs.

    Here's probably the lowest time investment method to do such a thing.  You should also be careful not to have implicit conversions, like you currently do, which will also slow things down.  And if you have SQL Server 2022, do NOT use the GENERATE_SERIES() function in place of the CROSS JOIN because it will negate the power of "Minimal Logging", which the following code also uses...

     INSERT INTO dbo.CustTransaction WITH (TABLOCK)
    (col1, col2, col3, col4, col5, col6)
    SELECT TOP 1000000
    N'C1' + REPLICATE(N'0',200),
    N'C2' + REPLICATE(N'0',200),
    N'C3' + REPLICATE(N'0',200),
    N'C4' + REPLICATE(N'0',200),
    N'C5' + REPLICATE(N'0',200),
    N'C6' + REPLICATE(N'0',200)
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    ;

    Third, even when using a physical hard drive to do the test instead of SSD's, the single "all at once" bulk update only takes 15 seconds to execute... not 15 minutes.   On SSDs, it takes less than 4 seconds.

    What I recommend is that you go back and revisit your code and do some retesting.  If there are virtually no changes in times, especially that 15 minute thing on the single "all at once" bulk update, then you either have a very old, possibly 32 bit machine or there is something seriously wrong with it.  If you do find that you made a mistake especially with that single "all at once" bulk update, my recommendation is to submit the corrections to the WebMaster to republish the article with.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply