Performance comparison of using WHILE and CURSOR

Question

Post reply

Performance comparison of using WHILE and CURSOR

yaya1899

SSC Eights!

Points: 937
More actions
June 26, 2009 at 3:42 am

#205862

Hi all,
Recently I learned about CURSOR and I found most experts advise not to use it due to performance issue. However, I found using CURSOR is much faster than using a WHILE for looping. Thus, I would like to seek for some valuable advices.
Basically I would like to loop each record in my record sets and perform some actions based on the selected field from each row.
In order to simplify my scenario, I have created a sample here for comparison.
1. Create a table variable with 10k records.
SET NOCOUNT ON
DECLARE @MyData TABLE(pk INT IDENTITY(1,1), id INT)
DECLARE @index INT
DECLARE @MyID INT
DECLARE @startTime DATETIME
DECLARE @endTime DATETIME
-- * Insert 10000 records --
SET @index = 1
WHILE @index <= 10000 BEGIN INSERT INTO @MyData VALUES (@index) SET @index = @index + 1
END
2. Using WHILE to loop each record and print execution time
-- * Looping table with WHILE --
SET @startTime = GETDATE()
WHILE @index <= 10000 BEGIN SET @MyID = (SELECT id FROM @MyData WHERE pk = @index)
-- * Do something based the selected ID --
SET @index = @index + 1
END
SET @endTime = GETDATE()
-- * Print Execution Time of WHILE Loop --
PRINT 'Execution Time of WHILE Loop (ms)'
PRINT DATEDIFF(ms, @startTime, @endTime)
3. Using CURSOR to loop each record and print execution time
-- * Looping table with CURSOR --
SET @startTime = GETDATE()
DECLARE cur_Test CURSOR FAST_FORWARD
FOR SELECT id FROM @MyData
OPEN cur_Test
FETCH NEXT FROM cur_Test INTO @MyID
WHILE @@FETCH_STATUS = 0
BEGIN
-- * Do something based the selected ID --
FETCH NEXT FROM cur_Test INTO @MyID;
END
CLOSE cur_Test
DEALLOCATE cur_Test
SET @endTime = GETDATE()
-- * Print Execution Time of CURSOR --
PRINT 'Execution Time of CURSOR (ms)'
PRINT DATEDIFF(ms, @startTime, @endTime)
SET NOCOUNT OFF
And finally I got this result.
Execution Time of WHILE Loop (ms)
11063
Execution Time of CURSOR (ms)
296
Kindly advise, thanks.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

(Before we go on to set-based, let's tidy the RBAR a bit)

SET NOCOUNT ON

DECLARE @MyData TABLE(pk INT IDENTITY(1,1) PRIMARY KEY, id INT)

DECLARE @index INT

DECLARE @MyID INT

DECLARE @startTime DATETIME

DECLARE @endTime DATETIME

-- * Insert 10000 records --

SET @index = 1

WHILE @index <= 10000

BEGIN

INSERT INTO @MyData VALUES (@index)

SET @index = @index + 1

END

-- * Looping table with WHILE --

SET @startTime = GETDATE()

WHILE @index <= 10000

BEGIN

SELECT @MyID = id FROM @MyData WHERE pk = @index

-- * Do something based the selected ID --

SET @index = @index + 1

END

SET @endTime = GETDATE()

-- * Print Execution Time of WHILE Loop --

PRINT 'Execution Time of WHILE Loop (ms)'

PRINT DATEDIFF(ms, @startTime, @endTime)

-- * Looping table with CURSOR --

SET @startTime = GETDATE()

DECLARE cur_Test CURSOR LOCAL STATIC FORWARD_ONLY

FOR SELECT id FROM @MyData

OPEN cur_Test

FETCH NEXT FROM cur_Test INTO @MyID

WHILE @@FETCH_STATUS = 0

BEGIN

-- * Do something based the selected ID --

FETCH NEXT FROM cur_Test INTO @MyID;

END

CLOSE cur_Test

DEALLOCATE cur_Test

SET @endTime = GETDATE()

-- * Print Execution Time of CURSOR --

PRINT 'Execution Time of CURSOR (ms)'

PRINT DATEDIFF(ms, @startTime, @endTime)

SET NOCOUNT OFF

[font="Courier New"]Execution Time of WHILE Loop (ms)

0

Execution Time of CURSOR (ms)

236

[/font]

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 2

Paul White (6/26/2009)
(Before we go on to set-based, let's tidy the RBAR a bit)

Can we go on now? 😎

yaya1899 SSC Eights! Points: 937 More actions · Answer 3

Hi Paul,

Thanks for enlighten me, I din't notice by adding the PRIMARY KEY will make so much difference.:-D

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

yaya1899 (6/26/2009)
Recently I learned about CURSOR and I found most experts advise not to use it due to performance issue. However, I found using CURSOR is much faster than using a WHILE for looping. Thus, I would like to seek for some valuable advices.

Cursors may be faster than while loops (though they usually aren't), but both are way slower than writing code in a set-based manner. SQL's optimised for set-based processes, not row-by-row processing.

Don't think about what you want to do to a row. This about operating on the entire resultset at once.

If you explain what it is that you're actually trying to do, we can probably get you a set-based alternative that runs significantly faster.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

yaya1899 SSC Eights! Points: 937 More actions · Answer 5

Hi GilaMonster,

Basically I have few tables to store system data of application, and those data maybe added/updated from time to time. Thus, I have a master script to store the latest data and will be executed on the client's server upon application update.

What I'm currently doing is,

1. Create a table variable with identical structure of the actual table.

2. Create INSERT for each data.

3. Compare the existing data and table variable with WHILE/CURSOR looping.

4. Find the difference and do necessary action.

Thanks.

Paul White SSC Guru Points: 150468 More actions · Answer 6

Florian Reischl (6/26/2009)
Paul White (6/26/2009)
(Before we go on to set-based, let's tidy the RBAR a bit)
Can we go on now? 😎

Heh. Just trying to be fair about it. 😀

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

yaya1899 (6/26/2009)
Hi GilaMonster,
Basically I have few tables to store system data of application, and those data maybe added/updated from time to time. Thus, I have a master script to store the latest data and will be executed on the client's server upon application update.
What I'm currently doing is,
1. Create a table variable with identical structure of the actual table.
2. Create INSERT for each data.
3. Compare the existing data and table variable with WHILE/CURSOR looping.
4. Find the difference and do necessary action.
Thanks.

That's not enough of a description for anyone to help with an alternative. Existing code would help, and table structures, and sample data. Read this to see the best way to post this to get quick responses.

http://www.sqlservercentral.com/articles/Best+Practices/61537/

It really doesn't sound like a cursor's needed. From your description is sounds like a fairly standard IF EXISTS... Update... ELSE Insert

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 8

So, Paul gave me the GO...

There is almost never a reason to work with cursors or loops in T-SQL. As Paul and Gail wrote SQL Server is made for set-based operations, not row-based operations.

Let me show you a simple improvement of your posted code. You used a WHILE loop to create your sample data table:

PRINT '---======================================'

PRINT '-- Insert 1000 records with WHILE loop'

DECLARE @MyData TABLE(pk INT IDENTITY(1,1), id INT)

DECLARE @index INT

DECLARE @MyID INT

DECLARE @startTime DATETIME

DECLARE @endTime DATETIME

SELECT

@index = 1,

@startTime = GETDATE();

WHILE @index <= 10000

BEGIN

INSERT INTO @MyData VALUES (@index)

SET @index = @index + 1

END;

SELECT @endTime = GETDATE();

PRINT 'Duration WHILE loop (ms): ' + CONVERT(VARCHAR(10), DATEDIFF(MILLISECOND, @startTime, @endTime))

You should have a Numbers or Tally table in your database to handle those operations. Search this site for "Tally table", you will find a really good article by Jeff Moden which explains how those tables work in different solutions.

Here a little snippet to create a Numbers table:

DROP TABLE Numbers

GO

CREATE TABLE Numbers

(

Num INT NOT NULL

)

GO

--INSERT INTO Numbers

-- SELECT

-- N

-- FROM Tally

INSERT INTO Numbers

SELECT TOP(11000)

ROW_NUMBER() OVER (ORDER BY (SELECT 1))

FROM master.sys.all_columns c1

CROSS JOIN master.sys.all_columns c2

GO

ALTER TABLE Numbers

ADD CONSTRAINT PK_Numbers

PRIMARY KEY CLUSTERED (Num)

WITH (FILLFACTOR = 100)

After you have a Numbers/Tally/Sequence/Whatever table you can use this to create your sample data like this:

DECLARE @MyData TABLE(pk INT IDENTITY(1,1), id INT)

DECLARE @startTime DATETIME

DECLARE @endTime DATETIME

PRINT '---======================================'

PRINT '-- Insert 1000 records with Numbers table'

SELECT

@startTime = GETDATE();

INSERT INTO @MyData

SELECT TOP(10000)

Num

FROM Numbers

SELECT @endTime = GETDATE();

PRINT 'Duration Numbers table (ms): ' + CONVERT(VARCHAR(10), DATEDIFF(MILLISECOND, @startTime, @endTime))

Sure, this is a very simple case but let me tell you there are really many solutions to be handled. Just read Jeff's article to see some of them.

Let's compare your WHILE loop to create data with a Numbers table solution:

---======================================

-- Insert 1000 records with WHILE loop

Duration WHILE loop (ms): 253

---======================================

-- Insert 1000 records with Numbers table

Duration Numbers table (ms): 16

yaya1899 (6/26/2009)
1. Create a table variable with identical structure of the actual table.
2. Create INSERT for each data.

As Gail wrote, too less information to be sure that this is required. Maybe you can improve your performance by inserting as XML which usually works faster since it can be used for set-based operations.

3. Compare the existing data and table variable with WHILE/CURSOR looping.
4. Find the difference and do necessary action.

As Gail wrote, this really looks like no need of cursors or loops. Usually you can handle this with joined UPDATE statements, OUTPUT and bulk INSERT operations.

Please give more information to help us to help (:unsure: sounds strange?!?)

Paul White SSC Guru Points: 150468 More actions · Answer 9

Nice work Flo 😉

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 10

Thanks Paul.

@yaya1899:

Just fount a failure in one part of my post. I filled the Numbers table with a Tally table :hehe:. Since you probably done have both this will generate an error. I changed the post and corrected this part...

Paul White SSC Guru Points: 150468 More actions · Answer 11

yaya1899 (6/26/2009)
Thanks for enlighten me, I din't notice by adding the PRIMARY KEY will make so much difference.:-D

Absolutely. Otherwise, SQL Server has to scan the table to find the next value for @MyID. With the index, it can get the value immediately.

BTW, I also changed the cursor definition slightly, since LOCAL cursors are faster (less slow?) than the (usually) default GLOBAL cursor. FORWARD_ONLY STATIC cursors may also be very slightly less slow than FAST_FORWARD cursors.

Occasionally it is not worth the effort* to avoid a cursor; generally however it really really is 🙂

Paul

* For example, if you already have some complex logic bound up in nested row-by-row procedures and you have a small one-off job to do, a cursor might be appropriate.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

yaya1899 SSC Eights! Points: 937 More actions · Answer 12

Let's take this as example, my application has tables to store Country, State, City, Province. And each table has a fk to the other table.

Example
1. CountryPK will be a FK to State.
2. State PK will be a FK to City.

Assumed those data will be added from time to time. Thus, I have a default script to store all the latest data.

Default Data Script

-- * Step 1 - Create a table variable with identical structure of the actual table --

DECLARE @Country TABLE (pk INT IDENTITY(1,1) PRIMARY KEY, newPK INT, CountryCode VARCHAR(60),

CountryName VARCHAR(60), isNew BIT DEFAULT(0))

DECLARE @recIndex INT

DECLARE @recCount INT

DECLARE @newRecPK INT

DECLARE @CountryCode CHAR(3)

-- * Step 2 - Create INSERT for each default data --

INSERT INTO @Country(CountryCode, CountryName) VALUES ('UK', 'United Kingdom')

INSERT INTO @Country(CountryCode, CountryName) VALUES ('JP', 'Japan')

INSERT INTO @Country(CountryCode, CountryName) VALUES ('CN', 'China')

INSERT INTO @Country(CountryCode, CountryName) VALUES ('HK', 'Hong Kong')

-- * Step 3 - Compare Data --

SET @recIndex = 1

SET @recCount = (SELECT COUNT(*) FROM @Country)

SET @newRecPK = ISNULL((SELECT MAX(CountryPK) + 1 FROM Country),1)

WHILE @recIndex <= @recCount

BEGIN

SET @CountryCode = (SELECT TOP 1 CountryCode FROM @Country WHERE pk = @recIndex)

IF NOT EXISTS (SELECT * FROM Country WHERE CountryCode = @CountryCode)

BEGIN

UPDATE @Country SET newPK = @newRecPK, isNew = 1 WHERE pk = @recIndex

SET @newRecPK = @newRecPK + 1

END

SET @recIndex = @recIndex + 1

END

-- * Step 4 - Checking for State, City, Province ... --

-- ...

-- * Step 5 - Insert New Country, State, City, Province ... --

INSERT INTO Country(CountryPK, CountryCode, CountryName)

SELECT newPK, CountryCode, CountryName FROM @Country WHERE isNew = 1

SELECT CAST(@@ROWCOUNT AS VARCHAR) + ' records added...'

Actual Table in Database

CREATE TABLE Country

(CountryPK INT PRIMARY KEY, CountryCode CHAR(3), CountryName VARCHAR(60))

The main reason I choose to use table variable as I need a place to store the newly added PK as they will be used in the further steps of update.

Example
China is added and has a PK 3, all the states in China will have a FK 3 when inserted.

Perhaps my thought is too complicated, using a UPDATE/SELECT would be more efficient?

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 13

You can use a LEFT JOIN to get only the countries which are not yet exist within your destination table and a ROW_NUMBER to create a PK.

CREATE TABLE #Country

(CountryPK INT PRIMARY KEY, CountryCode CHAR(3), CountryName VARCHAR(60))

INSERT INTO #Country(CountryPK, CountryCode, CountryName) VALUES (1, 'UK', 'United Kingdom')

INSERT INTO #Country(CountryPK, CountryCode, CountryName) VALUES (2, 'JP', 'Japan')

INSERT INTO #Country(CountryPK, CountryCode, CountryName) VALUES (3, 'HK', 'Hong Kong')

GO

-- * Step 1 - Create a table variable with identical structure of the actual table --

DECLARE @Country TABLE (pk INT IDENTITY(1,1) PRIMARY KEY, newPK INT, CountryCode VARCHAR(60),

CountryName VARCHAR(60), isNew BIT DEFAULT(0))

DECLARE @lastPK INT

-- * Step 2 - Create INSERT for each default data --

INSERT INTO @Country(CountryCode, CountryName) VALUES ('UK', 'United Kingdom')

INSERT INTO @Country(CountryCode, CountryName) VALUES ('JP', 'Japan')

INSERT INTO @Country(CountryCode, CountryName) VALUES ('CN', 'China')

INSERT INTO @Country(CountryCode, CountryName) VALUES ('HK', 'Hong Kong')

SELECT @lastPK = MAX(CountryPK)

FROM #Country

SELECT @lastPK = ISNULL(@lastPK, 0)

INSERT INTO #Country (

CountryPK,

CountryCode,

CountryName

)

SELECT

ROW_NUMBER() OVER (ORDER BY (SELECT 1)) + @lastPK,

c1.CountryCode,

c1.CountryName

FROM @Country c1

LEFT JOIN #Country c2 ON c1.CountryCode = c2.CountryCode

WHERE c2.CountryCode IS NULL

SELECT CAST(@@ROWCOUNT AS VARCHAR) + ' records added...'

GO

DROP TABLE #Country

Hint: This kind of primary key generation is very risky! You will run into problems when other applications do the same work at same time. Why not using an IDENTITY in your Country table?

yaya1899 SSC Eights! Points: 937 More actions · Answer 14

Hi Florian, thanks for your advice. My mind was thinking about using the procedural approach to solve the problem. Now, I learned that SQL is made for set-based operations.

Thanks again to those who replied. 😀