SQLServerCentral Article

Improving the Performance of UPDATE Statements in SQL Server

,

Introduction

Performing bulk updates in SQL Server can be a resource-intensive operation, particularly when dealing with large datasets. The efficiency of such operations is influenced by various factors, including batch size, disk I/O, memory usage, CPU utilization, and locking behavior. Larger batch sizes can reduce execution time by processing more records in fewer iterations, but they may also introduce challenges such as higher memory consumption, increased locking contention, and greater I/O pressure, which could negatively impact overall system performance.

This article explores the impact of different batch sizes on SQL Server performance during update operations. We conduct tests using batch sizes of 1000, 5000, and 10000 records per update. Key performance metrics, execution time, CPU utilization, memory usage, disk I/O statistics, and lock contention are evaluated to provide actionable insights. The goal is to identify how batch size influences performance and offer practical recommendations based on the test results.

Setting Up the Test Environment

To simulate a realistic workload, we created a test environment that included a database and a table called CustTransaction, populated with 1 million rows. The table’s design and data were based on the following SQL script:

USE MASTER;
GO
CREATE DATABASE UpdateDB;
GO
USE UpdateDB;
GO
CREATE TABLE CustTransaction
(
    id INT PRIMARY KEY IDENTITY(1,1), 
    col1 NVARCHAR(500),
    col2 NVARCHAR(500),
    col3 NVARCHAR(500),
    col4 NVARCHAR(500),
    col5 NVARCHAR(500),
    col6 NVARCHAR(500)
);
GO
INSERT INTO CustTransaction (col1, col2, col3, col4, col5, col6)
SELECT 
    'C1' + REPLICATE('0', 200),
    'C2' + REPLICATE('0', 200),
    'C3' + REPLICATE('0', 200),
    'C4' + REPLICATE('0', 200),
    'C5' + REPLICATE('0', 200),
    'C6' + REPLICATE('0', 200);
GO 1000000;

This script creates a table with six large text fields and populates it with 1 million rows. A large dataset like this allows us to evaluate how the update operation performs under realistic load conditions. For testing purposes, the operations were carried out in a staging environment, as performance tests should not be run on production systems without validation.

Understanding the Update Methods

Before diving into the testing methodology and results, it's important to understand the three update methods being evaluated. These methods represent different approaches for updating large datasets, each with its own performance characteristics.

1. Row-by-Row Update

The Row-by-Row Update method involves updating each row individually. While it guarantees accuracy, this method is inefficient, especially for large datasets, due to the overhead of processing each row separately.

Code Example: Row-by-Row Update

DECLARE @Counter INT = 1;
WHILE @Counter <= 1000000
BEGIN
    UPDATE CustTransaction
    SET col1 = 'Updated ' + col1
    WHERE id = @Counter;
    SET @Counter = @Counter + 1;
END

How It Works: SQL Server reads and updates each row individually, leading to high CPU and I/O overhead, which significantly impacts performance.

Performance Considerations: This approach is inefficient for large datasets due to excessive CPU and I/O consumption, causing longer execution times due to the overhead of updating each row individually.

2. Bulk Update

The Bulk Update method involves updating many records in a single operation. By processing records in bulk, it is typically faster than row-by-row updates.

Code Example: Bulk Update

UPDATE CustTransaction
SET col1 = 'Updated ' + col1
WHERE id <= 1000000;

How It Works: The update is performed in a single operation for all matching rows, which reduces the overhead of multiple individual update operations.

Performance Considerations: Although faster than the row-by-row method, bulk updates can still cause locking issues, especially with large datasets.

3. Batch Update

The Batch Update method involves updating a subset of rows in each operation. This approach helps reduce locking contention and allows better control over system resources.

Code Example: Batch Update

DECLARE @BatchSize INT = 10000;
DECLARE @RowCount INT = 1;
WHILE @RowCount > 0
BEGIN
    UPDATE TOP (@BatchSize) CustTransaction
    SET col1 = 'Updated ' + col1
    WHERE col1 NOT LIKE 'Updated%';
    SET @RowCount = @@ROWCOUNT;
END

How It Works: SQL Server updates a specified number of rows in each iteration, reducing the load on system resources and improving concurrency.

Performance Considerations: Batch updates strike a good balance between performance and resource utilization, reducing the strain of individual row updates and avoiding the issues of bulk updates.

Testing Methodology

We performed three tests based on different update methods: Row-by-Row Update, Bulk Update, and Batch Update (with varying batch sizes).

Test 1: Row-by-Row Update

In this test, we updated one row at a time, executing the following SQL query for each row:

DECLARE @CurrentID INT = 1;
WHILE @CurrentID <= 1000000
BEGIN
    UPDATE CustTransaction
    SET col1 = 'Updated ' + col1
    WHERE id = @CurrentID;
    SET @CurrentID = @CurrentID + 1;
END

Results:

  • Execution Time: ~45 minutes
  • CPU Usage: High
  • Locks Held: Exclusive (X)
  • I/O: High

Test 2: Bulk Update

For the bulk update, the following query was used:

UPDATE CustTransaction
SET col1 = 'Updated ' + col1
WHERE col1 NOT LIKE 'Updated%';

Results:

  • Execution Time: ~15 minutes
  • CPU Usage: Moderate
  • Locks Held: Exclusive (X)
  • I/O: Moderate to High

Test 3: Batch Update

For the Batch Update, we tested three batch sizes: 1000, 5000, and 10000.

Batch Size 1000:

DECLARE @BatchSize INT = 1000;
DECLARE @RowCount INT = 1;
WHILE 1 = 1
BEGIN
    UPDATE TOP (@BatchSize) CustTransaction
    SET col1 = 'Updated ' + col1
    WHERE id <= 1000000 AND col1 NOT LIKE 'Updated%';
    
    IF @@ROWCOUNT = 0
        BREAK;
END

Results:

  • Execution Time: ~14 seconds
  • Logical Reads: 429,051
  • Physical Reads: 1,716
  • Locks Held: Exclusive (X)
  • Memory Usage: 319 MB available out of 4024 MB total

Batch Size 5000:

DECLARE @BatchSize INT = 5000;
DECLARE @RowCount INT = 1;
WHILE 1 = 1
BEGIN
    UPDATE TOP (@BatchSize) CustTransaction
    SET col1 = 'Updated ' + col1
    WHERE id <= 1000000 AND col1 NOT LIKE 'Updated%';
    
    IF @@ROWCOUNT = 0
        BREAK;
END

Results:

  • Execution Time: ~13.27 seconds
  • Logical Reads: 429,055
  • Physical Reads: 1,718
  • Locks Held: Exclusive (X)
  • Memory Usage: 319 MB available out of 4024 MB total

Batch Size 10000:

DECLARE @BatchSize INT = 10000;
DECLARE @RowCount INT = 1;
WHILE 1 = 1
BEGIN
    UPDATE TOP (@BatchSize) CustTransaction
    SET col1 = 'Updated ' + col1
    WHERE id <= 1000000 AND col1 NOT LIKE 'Updated%';
    
    IF @@ROWCOUNT = 0
        BREAK;
END

Results:

  • Execution Time: ~13.25 seconds
  • Logical Reads: 429,060
  • Physical Reads: 1,722
  • Locks Held: Exclusive (X)
  • Memory Usage: 319 MB available out of 4024 MB total

Memory Usage Clarification: Memory usage varies depending on the system workload and concurrent operations. The provided values are a snapshot during the test and may fluctuate depending on the system’s overall resource demands.

Locking Behavior: Exclusive (X) locks were held during the update operations to ensure data consistency. These locks can cause blocking if other processes attempt to access the same rows concurrently, leading to potential performance degradation.

Performance Analysis and Results

The following table compares the performance metrics across the three update methods and various batch sizes (for Batch Updates):

Update MethodBatch SizeExecution TimeLogical ReadsPhysical ReadsLocks HeldCPU UsageMemory Usage
Row-by-Row Update1 Row at a time~45 minutesHighHighExclusive (X)HighHigh
Bulk UpdateAll at once~15 minutesModerateHighExclusive (X)ModerateModerate
Batch Update1000~14 seconds429,0511,716Exclusive (X)Low319 MB available
Batch Update5000~13.27 seconds429,0551,718Exclusive (X)Low319 MB available
Batch Update10000~13.25 seconds429,0601,722Exclusive (X)Low319 MB available

Key Insights:

  • Row-by-Row Update has the longest execution time and high CPU usage, making it inefficient for large datasets.
  • Bulk Update is faster but causes high I/O and moderate CPU usage. It’s suitable for high-volume data updates but can introduce locking contention.
  • Batch Update (especially with a batch size of 5000) offers the best performance, balancing execution time, CPU usage, and memory consumption. Increasing batch sizes beyond 5000 shows diminishing returns with slight increases in memory usage and no significant improvements in performance.

Key Considerations for Performance Optimization

  1. Locking and Lock Contention: Exclusive (X) locks ensure data consistency but can cause blocking when other queries try to access the same rows. The use of batch updates helps minimize the duration of lock holding, reducing contention compared to Row-by-Row updates. Be mindful of lock contention when running large updates in highly concurrent environments.
  2. Transaction Logging: All updates generate transaction logs, which can increase I/O pressure. Larger updates (particularly with batch sizes above 5000) might result in significant disk I/O, especially when logs are written for each update operation. Make sure the system has adequate I/O resources to handle this.
  3. Transaction Isolation Levels: The isolation level of your transactions can impact both locking and performance. For example, using READ COMMITTED isolation minimizes locks, while SERIALIZABLE could increase lock contention. Consider adjusting the isolation level based on your workload’s concurrency needs.
  4. Test in a Staging Environment: Always test performance changes in a staging environment before implementing them in production. This helps you assess system behavior under load and adjust batch sizes or update methods according to real-world conditions.

Conclusion

Batch updates generally outperform Row-by-Row and Bulk updates in terms of execution time, CPU usage, and memory efficiency. Based on our tests, a batch size of 5000 provides the best performance across all measured metrics. Larger batch sizes (e.g., 10000) offer only marginal performance improvements and may introduce additional memory or locking issues.

When implementing batch updates, consider adjusting the batch size based on your system’s resources, workload, and concurrency levels. Always test in a staging environment to ensure the chosen approach aligns with your production needs.

Share

Rate

4.43 (7)

You rated this post out of 5. Change rating