Performance comparison of using WHILE and CURSOR

  • Hi all,

    Recently I learned about CURSOR and I found most experts advise not to use it due to performance issue. However, I found using CURSOR is much faster than using a WHILE for looping. Thus, I would like to seek for some valuable advices.

    Basically I would like to loop each record in my record sets and perform some actions based on the selected field from each row.

    In order to simplify my scenario, I have created a sample here for comparison.

    1. Create a table variable with 10k records.

    SET NOCOUNT ON

    DECLARE @MyData TABLE(pk INT IDENTITY(1,1), id INT)

    DECLARE @index INT

    DECLARE @MyID INT

    DECLARE @startTime DATETIME

    DECLARE @endTime DATETIME

    -- * Insert 10000 records --

    SET @index = 1

    WHILE @index <= 10000
    BEGIN
    INSERT INTO @MyData VALUES (@index)
    SET @index = @index + 1

    END

    2. Using WHILE to loop each record and print execution time

    -- * Looping table with WHILE --

    SET @startTime = GETDATE()

    WHILE @index <= 10000
    BEGIN
    SET @MyID = (SELECT id FROM @MyData WHERE pk = @index)

    -- * Do something based the selected ID --

    SET @index = @index + 1

    END

    SET @endTime = GETDATE()

    -- * Print Execution Time of WHILE Loop --

    PRINT 'Execution Time of WHILE Loop (ms)'

    PRINT DATEDIFF(ms, @startTime, @endTime)

    3. Using CURSOR to loop each record and print execution time

    -- * Looping table with CURSOR --

    SET @startTime = GETDATE()

    DECLARE cur_Test CURSOR FAST_FORWARD

    FOR SELECT id FROM @MyData

    OPEN cur_Test

    FETCH NEXT FROM cur_Test INTO @MyID

    WHILE @@FETCH_STATUS = 0

    BEGIN

    -- * Do something based the selected ID --

    FETCH NEXT FROM cur_Test INTO @MyID;

    END

    CLOSE cur_Test

    DEALLOCATE cur_Test

    SET @endTime = GETDATE()

    -- * Print Execution Time of CURSOR --

    PRINT 'Execution Time of CURSOR (ms)'

    PRINT DATEDIFF(ms, @startTime, @endTime)

    SET NOCOUNT OFF

    And finally I got this result.

    Execution Time of WHILE Loop (ms)

    11063

    Execution Time of CURSOR (ms)

    296

    Kindly advise, thanks.

  • (Before we go on to set-based, let's tidy the RBAR a bit)

    SET NOCOUNT ON

    DECLARE @MyData TABLE(pk INT IDENTITY(1,1) PRIMARY KEY, id INT)

    DECLARE @index INT

    DECLARE @MyID INT

    DECLARE @startTime DATETIME

    DECLARE @endTime DATETIME

    -- * Insert 10000 records --

    SET @index = 1

    WHILE @index <= 10000

    BEGIN

    INSERT INTO @MyData VALUES (@index)

    SET @index = @index + 1

    END

    -- * Looping table with WHILE --

    SET @startTime = GETDATE()

    WHILE @index <= 10000

    BEGIN

    SELECT @MyID = id FROM @MyData WHERE pk = @index

    -- * Do something based the selected ID --

    SET @index = @index + 1

    END

    SET @endTime = GETDATE()

    -- * Print Execution Time of WHILE Loop --

    PRINT 'Execution Time of WHILE Loop (ms)'

    PRINT DATEDIFF(ms, @startTime, @endTime)

    -- * Looping table with CURSOR --

    SET @startTime = GETDATE()

    DECLARE cur_Test CURSOR LOCAL STATIC FORWARD_ONLY

    FOR SELECT id FROM @MyData

    OPEN cur_Test

    FETCH NEXT FROM cur_Test INTO @MyID

    WHILE @@FETCH_STATUS = 0

    BEGIN

    -- * Do something based the selected ID --

    FETCH NEXT FROM cur_Test INTO @MyID;

    END

    CLOSE cur_Test

    DEALLOCATE cur_Test

    SET @endTime = GETDATE()

    -- * Print Execution Time of CURSOR --

    PRINT 'Execution Time of CURSOR (ms)'

    PRINT DATEDIFF(ms, @startTime, @endTime)

    SET NOCOUNT OFF

    [font="Courier New"]Execution Time of WHILE Loop (ms)

    0

    Execution Time of CURSOR (ms)

    236

    [/font]

  • Paul White (6/26/2009)


    (Before we go on to set-based, let's tidy the RBAR a bit)

    Can we go on now? 😎

  • Hi Paul,

    Thanks for enlighten me, I din't notice by adding the PRIMARY KEY will make so much difference.:-D

  • yaya1899 (6/26/2009)


    Recently I learned about CURSOR and I found most experts advise not to use it due to performance issue. However, I found using CURSOR is much faster than using a WHILE for looping. Thus, I would like to seek for some valuable advices.

    Cursors may be faster than while loops (though they usually aren't), but both are way slower than writing code in a set-based manner. SQL's optimised for set-based processes, not row-by-row processing.

    Don't think about what you want to do to a row. This about operating on the entire resultset at once.

    If you explain what it is that you're actually trying to do, we can probably get you a set-based alternative that runs significantly faster.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi GilaMonster,

    Basically I have few tables to store system data of application, and those data maybe added/updated from time to time. Thus, I have a master script to store the latest data and will be executed on the client's server upon application update.

    What I'm currently doing is,

    1. Create a table variable with identical structure of the actual table.

    2. Create INSERT for each data.

    3. Compare the existing data and table variable with WHILE/CURSOR looping.

    4. Find the difference and do necessary action.

    Thanks.

  • Florian Reischl (6/26/2009)


    Paul White (6/26/2009)


    (Before we go on to set-based, let's tidy the RBAR a bit)

    Can we go on now? 😎

    Heh. Just trying to be fair about it. 😀

  • yaya1899 (6/26/2009)


    Hi GilaMonster,

    Basically I have few tables to store system data of application, and those data maybe added/updated from time to time. Thus, I have a master script to store the latest data and will be executed on the client's server upon application update.

    What I'm currently doing is,

    1. Create a table variable with identical structure of the actual table.

    2. Create INSERT for each data.

    3. Compare the existing data and table variable with WHILE/CURSOR looping.

    4. Find the difference and do necessary action.

    Thanks.

    That's not enough of a description for anyone to help with an alternative. Existing code would help, and table structures, and sample data. Read this to see the best way to post this to get quick responses.

    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    It really doesn't sound like a cursor's needed. From your description is sounds like a fairly standard IF EXISTS... Update... ELSE Insert

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • So, Paul gave me the GO...

    There is almost never a reason to work with cursors or loops in T-SQL. As Paul and Gail wrote SQL Server is made for set-based operations, not row-based operations.

    Let me show you a simple improvement of your posted code. You used a WHILE loop to create your sample data table:

    PRINT '---======================================'

    PRINT '-- Insert 1000 records with WHILE loop'

    DECLARE @MyData TABLE(pk INT IDENTITY(1,1), id INT)

    DECLARE @index INT

    DECLARE @MyID INT

    DECLARE @startTime DATETIME

    DECLARE @endTime DATETIME

    SELECT

    @index = 1,

    @startTime = GETDATE();

    WHILE @index <= 10000

    BEGIN

    INSERT INTO @MyData VALUES (@index)

    SET @index = @index + 1

    END;

    SELECT @endTime = GETDATE();

    PRINT 'Duration WHILE loop (ms): ' + CONVERT(VARCHAR(10), DATEDIFF(MILLISECOND, @startTime, @endTime))

    You should have a Numbers or Tally table in your database to handle those operations. Search this site for "Tally table", you will find a really good article by Jeff Moden which explains how those tables work in different solutions.

    Here a little snippet to create a Numbers table:

    DROP TABLE Numbers

    GO

    CREATE TABLE Numbers

    (

    Num INT NOT NULL

    )

    GO

    --INSERT INTO Numbers

    -- SELECT

    -- N

    -- FROM Tally

    INSERT INTO Numbers

    SELECT TOP(11000)

    ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM master.sys.all_columns c1

    CROSS JOIN master.sys.all_columns c2

    GO

    ALTER TABLE Numbers

    ADD CONSTRAINT PK_Numbers

    PRIMARY KEY CLUSTERED (Num)

    WITH (FILLFACTOR = 100)

    After you have a Numbers/Tally/Sequence/Whatever table you can use this to create your sample data like this:

    DECLARE @MyData TABLE(pk INT IDENTITY(1,1), id INT)

    DECLARE @startTime DATETIME

    DECLARE @endTime DATETIME

    PRINT '---======================================'

    PRINT '-- Insert 1000 records with Numbers table'

    SELECT

    @startTime = GETDATE();

    INSERT INTO @MyData

    SELECT TOP(10000)

    Num

    FROM Numbers

    SELECT @endTime = GETDATE();

    PRINT 'Duration Numbers table (ms): ' + CONVERT(VARCHAR(10), DATEDIFF(MILLISECOND, @startTime, @endTime))

    Sure, this is a very simple case but let me tell you there are really many solutions to be handled. Just read Jeff's article to see some of them.

    Let's compare your WHILE loop to create data with a Numbers table solution:

    ---======================================

    -- Insert 1000 records with WHILE loop

    Duration WHILE loop (ms): 253

    ---======================================

    -- Insert 1000 records with Numbers table

    Duration Numbers table (ms): 16

    yaya1899 (6/26/2009)


    1. Create a table variable with identical structure of the actual table.

    2. Create INSERT for each data.

    As Gail wrote, too less information to be sure that this is required. Maybe you can improve your performance by inserting as XML which usually works faster since it can be used for set-based operations.

    3. Compare the existing data and table variable with WHILE/CURSOR looping.

    4. Find the difference and do necessary action.

    As Gail wrote, this really looks like no need of cursors or loops. Usually you can handle this with joined UPDATE statements, OUTPUT and bulk INSERT operations.

    Please give more information to help us to help (:unsure: sounds strange?!?)

  • Nice work Flo 😉

  • Thanks Paul.

    @yaya1899:

    Just fount a failure in one part of my post. I filled the Numbers table with a Tally table :hehe:. Since you probably done have both this will generate an error. I changed the post and corrected this part...

  • yaya1899 (6/26/2009)


    Thanks for enlighten me, I din't notice by adding the PRIMARY KEY will make so much difference.:-D

    Absolutely. Otherwise, SQL Server has to scan the table to find the next value for @MyID. With the index, it can get the value immediately.

    BTW, I also changed the cursor definition slightly, since LOCAL cursors are faster (less slow?) than the (usually) default GLOBAL cursor. FORWARD_ONLY STATIC cursors may also be very slightly less slow than FAST_FORWARD cursors.

    Occasionally it is not worth the effort* to avoid a cursor; generally however it really really is 🙂

    Paul

    * For example, if you already have some complex logic bound up in nested row-by-row procedures and you have a small one-off job to do, a cursor might be appropriate.

  • Let's take this as example, my application has tables to store Country, State, City, Province. And each table has a fk to the other table.

    Example

    1. CountryPK will be a FK to State.

    2. State PK will be a FK to City.

    Assumed those data will be added from time to time. Thus, I have a default script to store all the latest data.

    Default Data Script

    -- * Step 1 - Create a table variable with identical structure of the actual table --

    DECLARE @Country TABLE (pk INT IDENTITY(1,1) PRIMARY KEY, newPK INT, CountryCode VARCHAR(60),

    CountryName VARCHAR(60), isNew BIT DEFAULT(0))

    DECLARE @recIndex INT

    DECLARE @recCount INT

    DECLARE @newRecPK INT

    DECLARE @CountryCode CHAR(3)

    -- * Step 2 - Create INSERT for each default data --

    INSERT INTO @Country(CountryCode, CountryName) VALUES ('UK', 'United Kingdom')

    INSERT INTO @Country(CountryCode, CountryName) VALUES ('JP', 'Japan')

    INSERT INTO @Country(CountryCode, CountryName) VALUES ('CN', 'China')

    INSERT INTO @Country(CountryCode, CountryName) VALUES ('HK', 'Hong Kong')

    -- * Step 3 - Compare Data --

    SET @recIndex = 1

    SET @recCount = (SELECT COUNT(*) FROM @Country)

    SET @newRecPK = ISNULL((SELECT MAX(CountryPK) + 1 FROM Country),1)

    WHILE @recIndex <= @recCount

    BEGIN

    SET @CountryCode = (SELECT TOP 1 CountryCode FROM @Country WHERE pk = @recIndex)

    IF NOT EXISTS (SELECT * FROM Country WHERE CountryCode = @CountryCode)

    BEGIN

    UPDATE @Country SET newPK = @newRecPK, isNew = 1 WHERE pk = @recIndex

    SET @newRecPK = @newRecPK + 1

    END

    SET @recIndex = @recIndex + 1

    END

    -- * Step 4 - Checking for State, City, Province ... --

    -- ...

    -- * Step 5 - Insert New Country, State, City, Province ... --

    INSERT INTO Country(CountryPK, CountryCode, CountryName)

    SELECT newPK, CountryCode, CountryName FROM @Country WHERE isNew = 1

    SELECT CAST(@@ROWCOUNT AS VARCHAR) + ' records added...'

    Actual Table in Database

    CREATE TABLE Country

    (CountryPK INT PRIMARY KEY, CountryCode CHAR(3), CountryName VARCHAR(60))

    The main reason I choose to use table variable as I need a place to store the newly added PK as they will be used in the further steps of update.

    Example

    China is added and has a PK 3, all the states in China will have a FK 3 when inserted.

    Perhaps my thought is too complicated, using a UPDATE/SELECT would be more efficient?

  • You can use a LEFT JOIN to get only the countries which are not yet exist within your destination table and a ROW_NUMBER to create a PK.

    CREATE TABLE #Country

    (CountryPK INT PRIMARY KEY, CountryCode CHAR(3), CountryName VARCHAR(60))

    INSERT INTO #Country(CountryPK, CountryCode, CountryName) VALUES (1, 'UK', 'United Kingdom')

    INSERT INTO #Country(CountryPK, CountryCode, CountryName) VALUES (2, 'JP', 'Japan')

    INSERT INTO #Country(CountryPK, CountryCode, CountryName) VALUES (3, 'HK', 'Hong Kong')

    GO

    -- * Step 1 - Create a table variable with identical structure of the actual table --

    DECLARE @Country TABLE (pk INT IDENTITY(1,1) PRIMARY KEY, newPK INT, CountryCode VARCHAR(60),

    CountryName VARCHAR(60), isNew BIT DEFAULT(0))

    DECLARE @lastPK INT

    -- * Step 2 - Create INSERT for each default data --

    INSERT INTO @Country(CountryCode, CountryName) VALUES ('UK', 'United Kingdom')

    INSERT INTO @Country(CountryCode, CountryName) VALUES ('JP', 'Japan')

    INSERT INTO @Country(CountryCode, CountryName) VALUES ('CN', 'China')

    INSERT INTO @Country(CountryCode, CountryName) VALUES ('HK', 'Hong Kong')

    SELECT @lastPK = MAX(CountryPK)

    FROM #Country

    SELECT @lastPK = ISNULL(@lastPK, 0)

    INSERT INTO #Country (

    CountryPK,

    CountryCode,

    CountryName

    )

    SELECT

    ROW_NUMBER() OVER (ORDER BY (SELECT 1)) + @lastPK,

    c1.CountryCode,

    c1.CountryName

    FROM @Country c1

    LEFT JOIN #Country c2 ON c1.CountryCode = c2.CountryCode

    WHERE c2.CountryCode IS NULL

    SELECT CAST(@@ROWCOUNT AS VARCHAR) + ' records added...'

    GO

    DROP TABLE #Country

    Hint: This kind of primary key generation is very risky! You will run into problems when other applications do the same work at same time. Why not using an IDENTITY in your Country table?

  • Hi Florian, thanks for your advice. My mind was thinking about using the procedural approach to solve the problem. Now, I learned that SQL is made for set-based operations.

    Thanks again to those who replied. 😀

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply