June 26, 2009 at 3:42 am
Hi all,
Recently I learned about CURSOR and I found most experts advise not to use it due to performance issue. However, I found using CURSOR is much faster than using a WHILE for looping. Thus, I would like to seek for some valuable advices.
Basically I would like to loop each record in my record sets and perform some actions based on the selected field from each row.
In order to simplify my scenario, I have created a sample here for comparison.
1. Create a table variable with 10k records.
SET NOCOUNT ON
DECLARE @MyData TABLE(pk INT IDENTITY(1,1), id INT)
DECLARE @index INT
DECLARE @MyID INT
DECLARE @startTime DATETIME
DECLARE @endTime DATETIME
-- * Insert 10000 records --
SET @index = 1
WHILE @index <= 10000
BEGIN
INSERT INTO @MyData VALUES (@index)
SET @index = @index + 1
END
2. Using WHILE to loop each record and print execution time
-- * Looping table with WHILE --
SET @startTime = GETDATE()
WHILE @index <= 10000
BEGIN
SET @MyID = (SELECT id FROM @MyData WHERE pk = @index)
-- * Do something based the selected ID --
END
SET @endTime = GETDATE()
-- * Print Execution Time of WHILE Loop --
PRINT 'Execution Time of WHILE Loop (ms)'
PRINT DATEDIFF(ms, @startTime, @endTime)
3. Using CURSOR to loop each record and print execution time
-- * Looping table with CURSOR --
SET @startTime = GETDATE()
DECLARE cur_Test CURSOR FAST_FORWARD
FOR SELECT id FROM @MyData
OPEN cur_Test
FETCH NEXT FROM cur_Test INTO @MyID
WHILE @@FETCH_STATUS = 0
BEGIN
-- * Do something based the selected ID --
FETCH NEXT FROM cur_Test INTO @MyID;
END
CLOSE cur_Test
DEALLOCATE cur_Test
SET @endTime = GETDATE()
-- * Print Execution Time of CURSOR --
PRINT 'Execution Time of CURSOR (ms)'
PRINT DATEDIFF(ms, @startTime, @endTime)
SET NOCOUNT OFF
And finally I got this result.
Execution Time of WHILE Loop (ms)
11063
Execution Time of CURSOR (ms)
296
Kindly advise, thanks.
June 26, 2009 at 4:30 am
(Before we go on to set-based, let's tidy the RBAR a bit)
SET NOCOUNT ON
DECLARE @MyData TABLE(pk INT IDENTITY(1,1) PRIMARY KEY, id INT)
DECLARE @index INT
DECLARE @MyID INT
DECLARE @startTime DATETIME
DECLARE @endTime DATETIME
-- * Insert 10000 records --
SET @index = 1
WHILE @index <= 10000
BEGIN
INSERT INTO @MyData VALUES (@index)
END
-- * Looping table with WHILE --
SET @startTime = GETDATE()
WHILE @index <= 10000
BEGIN
SELECT @MyID = id FROM @MyData WHERE pk = @index
-- * Do something based the selected ID --
END
SET @endTime = GETDATE()
-- * Print Execution Time of WHILE Loop --
PRINT 'Execution Time of WHILE Loop (ms)'
PRINT DATEDIFF(ms, @startTime, @endTime)
-- * Looping table with CURSOR --
SET @startTime = GETDATE()
DECLARE cur_Test CURSOR LOCAL STATIC FORWARD_ONLY
FOR SELECT id FROM @MyData
OPEN cur_Test
FETCH NEXT FROM cur_Test INTO @MyID
WHILE @@FETCH_STATUS = 0
BEGIN
-- * Do something based the selected ID --
FETCH NEXT FROM cur_Test INTO @MyID;
END
CLOSE cur_Test
DEALLOCATE cur_Test
SET @endTime = GETDATE()
-- * Print Execution Time of CURSOR --
PRINT 'Execution Time of CURSOR (ms)'
PRINT DATEDIFF(ms, @startTime, @endTime)
SET NOCOUNT OFF
[font="Courier New"]Execution Time of WHILE Loop (ms)
0
Execution Time of CURSOR (ms)
236
[/font]
June 26, 2009 at 4:49 am
Paul White (6/26/2009)
(Before we go on to set-based, let's tidy the RBAR a bit)
Can we go on now? 😎
June 26, 2009 at 4:52 am
Hi Paul,
Thanks for enlighten me, I din't notice by adding the PRIMARY KEY will make so much difference.:-D
June 26, 2009 at 4:53 am
yaya1899 (6/26/2009)
Recently I learned about CURSOR and I found most experts advise not to use it due to performance issue. However, I found using CURSOR is much faster than using a WHILE for looping. Thus, I would like to seek for some valuable advices.
Cursors may be faster than while loops (though they usually aren't), but both are way slower than writing code in a set-based manner. SQL's optimised for set-based processes, not row-by-row processing.
Don't think about what you want to do to a row. This about operating on the entire resultset at once.
If you explain what it is that you're actually trying to do, we can probably get you a set-based alternative that runs significantly faster.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 26, 2009 at 5:10 am
Hi GilaMonster,
Basically I have few tables to store system data of application, and those data maybe added/updated from time to time. Thus, I have a master script to store the latest data and will be executed on the client's server upon application update.
What I'm currently doing is,
1. Create a table variable with identical structure of the actual table.
2. Create INSERT for each data.
3. Compare the existing data and table variable with WHILE/CURSOR looping.
4. Find the difference and do necessary action.
Thanks.
June 26, 2009 at 5:19 am
Florian Reischl (6/26/2009)
Paul White (6/26/2009)
(Before we go on to set-based, let's tidy the RBAR a bit)Can we go on now? 😎
Heh. Just trying to be fair about it. 😀
June 26, 2009 at 5:24 am
yaya1899 (6/26/2009)
Hi GilaMonster,Basically I have few tables to store system data of application, and those data maybe added/updated from time to time. Thus, I have a master script to store the latest data and will be executed on the client's server upon application update.
What I'm currently doing is,
1. Create a table variable with identical structure of the actual table.
2. Create INSERT for each data.
3. Compare the existing data and table variable with WHILE/CURSOR looping.
4. Find the difference and do necessary action.
Thanks.
That's not enough of a description for anyone to help with an alternative. Existing code would help, and table structures, and sample data. Read this to see the best way to post this to get quick responses.
http://www.sqlservercentral.com/articles/Best+Practices/61537/
It really doesn't sound like a cursor's needed. From your description is sounds like a fairly standard IF EXISTS... Update... ELSE Insert
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 26, 2009 at 5:38 am
So, Paul gave me the GO...
There is almost never a reason to work with cursors or loops in T-SQL. As Paul and Gail wrote SQL Server is made for set-based operations, not row-based operations.
Let me show you a simple improvement of your posted code. You used a WHILE loop to create your sample data table:
PRINT '---======================================'
PRINT '-- Insert 1000 records with WHILE loop'
DECLARE @MyData TABLE(pk INT IDENTITY(1,1), id INT)
DECLARE @index INT
DECLARE @MyID INT
DECLARE @startTime DATETIME
DECLARE @endTime DATETIME
SELECT
@index = 1,
@startTime = GETDATE();
WHILE @index <= 10000
BEGIN
INSERT INTO @MyData VALUES (@index)
END;
SELECT @endTime = GETDATE();
PRINT 'Duration WHILE loop (ms): ' + CONVERT(VARCHAR(10), DATEDIFF(MILLISECOND, @startTime, @endTime))
You should have a Numbers or Tally table in your database to handle those operations. Search this site for "Tally table", you will find a really good article by Jeff Moden which explains how those tables work in different solutions.
Here a little snippet to create a Numbers table:
DROP TABLE Numbers
GO
CREATE TABLE Numbers
(
Num INT NOT NULL
)
GO
--INSERT INTO Numbers
-- SELECT
-- N
-- FROM Tally
INSERT INTO Numbers
SELECT TOP(11000)
ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM master.sys.all_columns c1
CROSS JOIN master.sys.all_columns c2
GO
ALTER TABLE Numbers
ADD CONSTRAINT PK_Numbers
PRIMARY KEY CLUSTERED (Num)
WITH (FILLFACTOR = 100)
After you have a Numbers/Tally/Sequence/Whatever table you can use this to create your sample data like this:
DECLARE @MyData TABLE(pk INT IDENTITY(1,1), id INT)
DECLARE @startTime DATETIME
DECLARE @endTime DATETIME
PRINT '---======================================'
PRINT '-- Insert 1000 records with Numbers table'
SELECT
@startTime = GETDATE();
INSERT INTO @MyData
SELECT TOP(10000)
Num
FROM Numbers
SELECT @endTime = GETDATE();
PRINT 'Duration Numbers table (ms): ' + CONVERT(VARCHAR(10), DATEDIFF(MILLISECOND, @startTime, @endTime))
Sure, this is a very simple case but let me tell you there are really many solutions to be handled. Just read Jeff's article to see some of them.
Let's compare your WHILE loop to create data with a Numbers table solution:
---======================================
-- Insert 1000 records with WHILE loop
Duration WHILE loop (ms): 253
---======================================
-- Insert 1000 records with Numbers table
Duration Numbers table (ms): 16
yaya1899 (6/26/2009)
1. Create a table variable with identical structure of the actual table.2. Create INSERT for each data.
As Gail wrote, too less information to be sure that this is required. Maybe you can improve your performance by inserting as XML which usually works faster since it can be used for set-based operations.
3. Compare the existing data and table variable with WHILE/CURSOR looping.
4. Find the difference and do necessary action.
As Gail wrote, this really looks like no need of cursors or loops. Usually you can handle this with joined UPDATE statements, OUTPUT and bulk INSERT operations.
Please give more information to help us to help (:unsure: sounds strange?!?)
June 26, 2009 at 5:45 am
Nice work Flo 😉
June 26, 2009 at 6:07 am
yaya1899 (6/26/2009)
Thanks for enlighten me, I din't notice by adding the PRIMARY KEY will make so much difference.:-D
Absolutely. Otherwise, SQL Server has to scan the table to find the next value for @MyID. With the index, it can get the value immediately.
BTW, I also changed the cursor definition slightly, since LOCAL cursors are faster (less slow?) than the (usually) default GLOBAL cursor. FORWARD_ONLY STATIC cursors may also be very slightly less slow than FAST_FORWARD cursors.
Occasionally it is not worth the effort* to avoid a cursor; generally however it really really is 🙂
Paul
* For example, if you already have some complex logic bound up in nested row-by-row procedures and you have a small one-off job to do, a cursor might be appropriate.
June 26, 2009 at 11:11 am
Let's take this as example, my application has tables to store Country, State, City, Province. And each table has a fk to the other table.
Example
1. CountryPK will be a FK to State.
2. State PK will be a FK to City.
Assumed those data will be added from time to time. Thus, I have a default script to store all the latest data.
Default Data Script
-- * Step 1 - Create a table variable with identical structure of the actual table --
DECLARE @Country TABLE (pk INT IDENTITY(1,1) PRIMARY KEY, newPK INT, CountryCode VARCHAR(60),
CountryName VARCHAR(60), isNew BIT DEFAULT(0))
DECLARE @recIndex INT
DECLARE @recCount INT
DECLARE @newRecPK INT
DECLARE @CountryCode CHAR(3)
-- * Step 2 - Create INSERT for each default data --
INSERT INTO @Country(CountryCode, CountryName) VALUES ('UK', 'United Kingdom')
INSERT INTO @Country(CountryCode, CountryName) VALUES ('JP', 'Japan')
INSERT INTO @Country(CountryCode, CountryName) VALUES ('CN', 'China')
INSERT INTO @Country(CountryCode, CountryName) VALUES ('HK', 'Hong Kong')
-- * Step 3 - Compare Data --
SET @recIndex = 1
SET @recCount = (SELECT COUNT(*) FROM @Country)
SET @newRecPK = ISNULL((SELECT MAX(CountryPK) + 1 FROM Country),1)
WHILE @recIndex <= @recCount
BEGIN
SET @CountryCode = (SELECT TOP 1 CountryCode FROM @Country WHERE pk = @recIndex)
IF NOT EXISTS (SELECT * FROM Country WHERE CountryCode = @CountryCode)
BEGIN
UPDATE @Country SET newPK = @newRecPK, isNew = 1 WHERE pk = @recIndex
SET @newRecPK = @newRecPK + 1
END
SET @recIndex = @recIndex + 1
END
-- * Step 4 - Checking for State, City, Province ... --
-- ...
-- * Step 5 - Insert New Country, State, City, Province ... --
INSERT INTO Country(CountryPK, CountryCode, CountryName)
SELECT newPK, CountryCode, CountryName FROM @Country WHERE isNew = 1
SELECT CAST(@@ROWCOUNT AS VARCHAR) + ' records added...'
Actual Table in Database
CREATE TABLE Country
(CountryPK INT PRIMARY KEY, CountryCode CHAR(3), CountryName VARCHAR(60))
The main reason I choose to use table variable as I need a place to store the newly added PK as they will be used in the further steps of update.
Example
China is added and has a PK 3, all the states in China will have a FK 3 when inserted.
Perhaps my thought is too complicated, using a UPDATE/SELECT would be more efficient?
June 26, 2009 at 11:46 am
You can use a LEFT JOIN to get only the countries which are not yet exist within your destination table and a ROW_NUMBER to create a PK.
CREATE TABLE #Country
(CountryPK INT PRIMARY KEY, CountryCode CHAR(3), CountryName VARCHAR(60))
INSERT INTO #Country(CountryPK, CountryCode, CountryName) VALUES (1, 'UK', 'United Kingdom')
INSERT INTO #Country(CountryPK, CountryCode, CountryName) VALUES (2, 'JP', 'Japan')
INSERT INTO #Country(CountryPK, CountryCode, CountryName) VALUES (3, 'HK', 'Hong Kong')
GO
-- * Step 1 - Create a table variable with identical structure of the actual table --
DECLARE @Country TABLE (pk INT IDENTITY(1,1) PRIMARY KEY, newPK INT, CountryCode VARCHAR(60),
CountryName VARCHAR(60), isNew BIT DEFAULT(0))
DECLARE @lastPK INT
-- * Step 2 - Create INSERT for each default data --
INSERT INTO @Country(CountryCode, CountryName) VALUES ('UK', 'United Kingdom')
INSERT INTO @Country(CountryCode, CountryName) VALUES ('JP', 'Japan')
INSERT INTO @Country(CountryCode, CountryName) VALUES ('CN', 'China')
INSERT INTO @Country(CountryCode, CountryName) VALUES ('HK', 'Hong Kong')
SELECT @lastPK = MAX(CountryPK)
FROM #Country
SELECT @lastPK = ISNULL(@lastPK, 0)
INSERT INTO #Country (
CountryPK,
CountryCode,
CountryName
)
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) + @lastPK,
c1.CountryCode,
c1.CountryName
FROM @Country c1
LEFT JOIN #Country c2 ON c1.CountryCode = c2.CountryCode
WHERE c2.CountryCode IS NULL
SELECT CAST(@@ROWCOUNT AS VARCHAR) + ' records added...'
GO
DROP TABLE #Country
Hint: This kind of primary key generation is very risky! You will run into problems when other applications do the same work at same time. Why not using an IDENTITY in your Country table?
June 26, 2009 at 8:26 pm
Hi Florian, thanks for your advice. My mind was thinking about using the procedural approach to solve the problem. Now, I learned that SQL is made for set-based operations.
Thanks again to those who replied. 😀
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply