Get Set For Speed

Data is stored in individual rows in a table, but what is the best method of storing or retrieving that data? Many application developers would be quite familiar with a row-based query approach because it is very similar to the looping constructs found in traditional application development languages. You loop through a set of data and perform an action one element at a time. When this method is used to retrieve data from a data source, it means acting on one row of data at a time. I intentionally used the term "set" there to differentiate how set-based SQL operations work. Rather than process one row at a time, they work against all of the data at once. In the following examples, I will illustrate the differences in design and performance between cursors, while loops, and set operations.

Row-based: Cursors and While loops

Let's take a look at the general syntax to see how it works.

DECLARE MyCursor CURSOR
    FOR SELECT ...
OPEN MyCursor
FETCH NEXT FROM MyCursor
INTO @LocalVariable
WHILE @@FETCH_STATUS = 0
BEGIN
    -- do something
    FETCH NEXT FROM MyCursor
    INTO @LocalVariable
END
CLOSE MyCursor
DEALLOCATE MyCursor

We know SELECT is the basic data retrieval command in the SQL language, and here we see it used in the definition of the cursor. Each time we execute the statement "FETCH NEXT", we are essentially re-executing that SELECT statement to get the next top result row. Here is a very similar approach without defining a cursor:

DECLARE @var varchar(50) = '';
WHILE @var is not null
BEGIN
    SET @var =
    (
        SELECT TOP(1) column1 FROM table1
        WHERE column1 > @var
        ORDER BY column1
    )
    IF @var is null
        BREAK;
    -- do something
END

Set-based

The idea behind a set-based approach is to attempt to retrieve the same final result but with as few set requests as possible. In fact, the ideal scenario is to design a solution that allows the entire final result to be retrieved from a single dataset.

One example to explain them all

Let's take a look at a simple example using AdventureWorks. We want to report the total number of items ordered by all customers during each month of 2007. The first part of the script creates the temp tables. The next 3 parts run the report using a cursor, then a while loop, and finally a single set operation.

BEGIN; -- setup
    IF OBJECT_ID('tempdb..#Months') is not null
        DROP TABLE #Months;
    CREATE TABLE #Months
    (
        TheMonthNumber char(2)
        , TheMonthName varchar(20)
    );
    INSERT #Months
        (TheMonthNumber, TheMonthName)
        VALUES
          ('01', 'January')
        , ('02', 'February')
        , ('03', 'March')
        , ('04', 'April')
        , ('05', 'May')
        , ('06', 'June')
        , ('07', 'July')
        , ('08', 'August')
        , ('09', 'September')
        , ('10', 'October')
        , ('11', 'November')
        , ('12', 'December')
    ;
    IF OBJECT_ID('tempdb..#MonthlyItemsOrdered') is not null
        DROP TABLE #MonthlyItemsOrdered;
    CREATE TABLE #MonthlyItemsOrdered
    (
        TheMonth varchar(20)
        , TotalOrdered int
    );
END;
GO
PRINT 'Begin cursor';
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
DECLARE @StartTime datetime2(7), @EndTime datetime2(7);
BEGIN; -- cursor
    SET @StartTime = SYSDATETIME();
    TRUNCATE TABLE #MonthlyItemsOrdered;
    DECLARE @MonthNumber char(2), @MonthName varchar(20);
    DECLARE @RangeStart date, @RangeEnd date;
    DECLARE MyCursor CURSOR FAST_FORWARD FOR
        SELECT m.TheMonthNumber, m.TheMonthName
        FROM #Months m
    ;
    OPEN MyCursor
    FETCH NEXT FROM MyCursor
    INTO @MonthNumber, @MonthName
    WHILE @@FETCH_STATUS = 0
    BEGIN
        SET @RangeStart = CONVERT(date, '2007' + @MonthNumber + '01');
        SET @RangeEnd = DATEADD(day, -1, DATEADD(month, 1, @RangeStart));
        INSERT #MonthlyItemsOrdered
            (TheMonth, TotalOrdered)
            SELECT
                CONCAT(@MonthNumber, ' - ', @MonthName)
                , SUM(sod.OrderQty)
            FROM AdventureWorks.Sales.SalesOrderDetail sod
                INNER JOIN AdventureWorks.Sales.SalesOrderHeader soh
                    ON soh.SalesOrderID = sod.SalesOrderID
            WHERE soh.OrderDate BETWEEN @RangeStart AND @RangeEnd
        ;
        FETCH NEXT FROM MyCursor 
        INTO @MonthNumber, @MonthName
    END
    CLOSE MyCursor
    DEALLOCATE MyCursor
    SET @EndTime = SYSDATETIME();
    SELECT * FROM #MonthlyItemsOrdered;
    SELECT Style = 'cursor', ElapsedMilliseconds = DATEDIFF(ms, @StartTime, @EndTime);
END;
GO
PRINT 'Begin while loop';
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
DECLARE @StartTime datetime2(7), @EndTime datetime2(7);
BEGIN; -- while loop
    SET @StartTime = SYSDATETIME();
    TRUNCATE TABLE #MonthlyItemsOrdered;
    DECLARE @MonthNumber char(2), @MonthName varchar(20);
    DECLARE @RangeStart date, @RangeEnd date;
    SET @MonthNumber = '00';
    WHILE 1=1 -- break handled within
    BEGIN;
        SET @MonthName = '';
        SELECT TOP(1)
            @MonthNumber = m.TheMonthNumber
            , @MonthName = m.TheMonthName
        FROM #Months m
        WHERE m.TheMonthNumber > @MonthNumber
        ORDER BY TheMonthNumber
        ;
        IF @MonthName = ''
            BREAK;
        SET @RangeStart = CONVERT(date, '2007' + @MonthNumber + '01');
        SET @RangeEnd = DATEADD(day, -1, DATEADD(month, 1, @RangeStart));
        INSERT #MonthlyItemsOrdered
            (TheMonth, TotalOrdered)
            SELECT
                CONCAT(@MonthNumber, ' - ', @MonthName)
                , SUM(sod.OrderQty)
            FROM AdventureWorks.Sales.SalesOrderDetail sod
                INNER JOIN AdventureWorks.Sales.SalesOrderHeader soh
                    ON soh.SalesOrderID = sod.SalesOrderID
            WHERE soh.OrderDate BETWEEN @RangeStart AND @RangeEnd
        ;
    END;
    SET @EndTime = SYSDATETIME();
    SELECT * FROM #MonthlyItemsOrdered;
    SELECT Style = 'while loop', ElapsedMilliseconds = DATEDIFF(ms, @StartTime, @EndTime);
END;
GO
PRINT 'Begin set';
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
DECLARE @StartTime datetime2(7), @EndTime datetime2(7);
BEGIN; -- set
    SET @StartTime = SYSDATETIME();
    INSERT #MonthlyItemsOrdered
        (TheMonth, TotalOrdered)
        SELECT
            CONCAT(RIGHT(CONCAT('00', MONTH(soh.OrderDate)), 2), ' - ', DATENAME(month, soh.OrderDate))
            , SUM(sod.OrderQty)
        FROM AdventureWorks.Sales.SalesOrderDetail sod
            INNER JOIN AdventureWorks.Sales.SalesOrderHeader soh
                ON soh.SalesOrderID = sod.SalesOrderID
        WHERE soh.OrderDate BETWEEN CONVERT(date, '20070101') AND CONVERT(date, '20071231')
        GROUP BY CONCAT(RIGHT(CONCAT('00', MONTH(soh.OrderDate)), 2), ' - ', DATENAME(month, soh.OrderDate))
    ;
    SET @EndTime = SYSDATETIME();
    SELECT * FROM #MonthlyItemsOrdered;
    SELECT Style = 'set', ElapsedMilliseconds = DATEDIFF(ms, @StartTime, @EndTime);
END;

Performance Results

	ElapsedMilliseconds
Style	Test 1	Test 2	Test 3	Test 4	Test 5
cursor	451	451	411	457	410
while loop	354	366	364	367	360
set	258	300	248	196	202

As you can see from the code, not only is the set-based solution easier to read, it only issues a single statement to the engine as opposed to one statement per row. Looking at the performance results, we see that the set-based solution is consistently much faster than either of the other two.

Conclusion

It turns out what’s good for the goose is not always good for the gander. Those loops that work so well in C# application code aren’t so great in SQL.

It is important to note that set-based design will not automatically result in faster performance, especially if that particular query includes subqueries in the SELECT clause. For example, a query like this:

SELECT a
    , (SELECT x FROM table2) AS field1
    , (SELECT y FROM table3) AS field2
    , (SELECT z FROM table4) AS field3
FROM table1

looks like a set-based approach but really isn't because each of those subqueries will have to run for each row returned by the main query. For more information on ways that some attempted set-based solutions really aren't truly set-based after all, read Jeff Moden's article here: http://www.sqlservercentral.com/articles/T-SQL/61539/ entitled "Hidden RBAR: Triangular Joins".

Row-based: Cursors and While loops

Set-based

One example to explain them all

Performance Results

Conclusion

Resources

Rate

Share

Categories

Share

Rate

Get Set For Speed

Row-based: Cursors and While loops

Set-based

One example to explain them all

Performance Results

Conclusion

Resources

Rate

Share

Categories

Share

Rate

Related content

Set-Based String Splitting table function

PatReplace8K

PatExclude8K