December 23, 2008 at 10:29 pm
Comments posted to this topic are about the item Cursors Be Gone!
Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein
December 24, 2008 at 12:10 am
Good idea, Gaby, and good article...
Just a couple of tips... if you simply add Forward_Only and Read_Only to the cursor declaration to make it a "fire hose" cursor, you've basically done the same thing with a lot less work.
Also, don't trust the % of batch in any execution plan... it lies... sometimes, it lies a lot! For example... in the following code, both the estimated and actual execution plans say that the first query in the following code will take 0% of the batch and the 2nd query will take 100% of the batch even though they do the same thing. AND, when you run the code just the opposite is true if you look at the output in the message tab... the 2nd query blows the doors off the first query.
SET NOCOUNT ON
--=======================================================================================
-- Recursive method shown by (Name with-held)
--=======================================================================================
PRINT '========== Recursive method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.
--===== Execute the code being tested ===================================================
DECLARE @DateVal DATETIME
SET @DateVal = '2008-01-01'
;with mycte as
(
select @DateVal AS DateVal
union all
select DateVal + 1
from mycte
where DateVal + 1 < DATEADD(yy, 5, @DateVal)
)
select @Bitbucket = d.dateval
from mycte d
OPTION (MAXRECURSION 0)
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
GO
--=======================================================================================
-- Tally table method by Jeff Moden
--=======================================================================================
PRINT '========== Tally table method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.
--===== Execute the code being tested ===================================================
DECLARE @StartDate AS DATETIME
SET @StartDate = '2008-01-01'
SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,5,@StartDate)))
@Bitbucket = @StartDate-1+t.N
FROM Tally t
ORDER BY N
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
The reason why this occurs is because the first query actually contains RBAR in the form of a Recursive CTE and only the first "loop" is measured. Other anomolies also adversly affect the % of batch listings in execution plans.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 24, 2008 at 12:25 am
Almost forgot... don't forget that sp_MsForEachDB is nothing but a big ol' cursor. Look at the code if you don't believe me. Also, in SQL Server 2005, you don't even need a While loop for this type of stuff anymore... here's the same thing using a "pseudo-cursor" to build all of the SQL for all of the databases all at once...
DECLARE @sql NVARCHAR(MAX)
SELECT @sql = COALESCE(@SQL+CHAR(10),'') + 'DBCC CheckDB(' + QUOTENAME(Name) + ')'
FROM Master.sys.Databases
WHERE Database_ID > 4
PRINT @sql
EXEC (@SQL)
--Jeff Moden
Change is inevitable... Change for the better is not.
December 24, 2008 at 12:46 am
As I wrote on an email sent to Gaby you also have to be careful cause sometimes (absolutely random) the variable of type table is not being dumped from the memory by SQL Server. If you use a query in a web page to compute some amounts base on real time results and you hit refresh several times you might see that after a while the page is not updating with the good results. To get rid of this kind of problem I have used a temporary table instead of a variable of type table.
December 24, 2008 at 1:15 am
Good idea and nice article. Thank you
December 24, 2008 at 2:59 am
A bit of an odd example to pick. All that effort to avoid using a cursor on an example that executes dbcc checkdb for each of just a few rows. Surely the time it takes time to parse a cursor in this example is totally insignificant when compared to the time taken to do a 'dbcc checkdb' for each database?
Also, when you say "Imagine scaling the cursor to tables with millions of rows". I'm imagining it! Are you sure that inserting the rows into a temporary table with no index and deleting each row, in a loop, on an individual basis, and selecting the count(*) from the table for each row would actually be quicker than using a cursor to just scan through a recordset? :crazy:
December 24, 2008 at 3:00 am
As you can see in the image below, while the actual allocation/deallocation of the cursor doesn't have much of an impact, the inefficient way the query is executed is what causes this to be slow. Imagine scaling the cursor to tables with millions of rows.
The thought of scaling either RBAR to millions of rows fills my heart with sorrow.
But, making me even more sad is the fact that yes, we've replaced a cursor, but actually with a method that's even more inefficient. Say you did have 1,000,000 rows, for the cursor you would have the read of the data and cursor through it. For the other way, you would have the creation of the table (be it temporary or variable), the insert into it, and 2,000,000 selects and 1,000,000 deletes.
Try the following three queries - one of which is a cursor, one of which uses this method with a temp table, and one of which uses this method with a table variable. The query works on 100,000 rows and executes reasonably meaningless code so that we're testing the performance of the iteration method, rather than the performance of what is being executed.
Note also that in the temp table / table variable methods I have added a clustering PK to increase the performance of these methods - i got bored of waiting for the queries to finish without!
On my server, the cursor takes 12.562 seconds to execute, the temp table method takes 33.093 seconds to execute, and the table variable method takes 32.218 seconds to exeucte.
Cursor method:
SET NOCOUNT ON
declare @query varchar(100), @id int
declare BadCursor Cursor for
select top 100000 ID from [tblLogTransactions]
open BadCursor
fetch next from BadCursor into @id
while @@fetch_status = 0
begin
select @query = 'declare @i int; set @i = ' + convert(varchar,@id)
exec(@query)
fetch next from BadCursor into @id
end
close BadCursor
deallocate BadCursor
go
Temp Table method:
SET NOCOUNT ON
declare @query varchar(100), @id int
CREATE TABLE #muchWorseProblem (ID int primary key clustered)
INSERT INTO #muchWorseProblem (ID)
select top 100000 ID from [tblLogTransactions]
while (select count(*) from #muchWorseProblem) > 0
begin
select top 1 @id = ID from #muchWorseProblem
select @query = 'declare @i int; set @i = ' + convert(varchar,@id)
exec(@query)
delete from #muchWorseProblem where id = @id
end
Table Variable method:
SET NOCOUNT ON
declare @query varchar(100), @id int
DECLARE @muchWorseProblem TABLE (ID int primary key clustered)
INSERT INTO @muchWorseProblem (ID)
select top 100000 ID from [tblLogTransactions]
while (select count(*) from @muchWorseProblem) > 0
begin
select top 1 @id = ID from @muchWorseProblem
select @query = 'declare @i int; set @i = ' + convert(varchar,@id)
exec(@query)
delete from @muchWorseProblem where id = @id
end
:blink:
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
December 24, 2008 at 3:11 am
We've largely got rid of cursors, and have been using table variables for ages.
Only thing in the article that I'd change would be to ditch the Select Count and the delete and use a variable with @@rowcount (by doing a select x from table variable just before setting the variable)and a counter variable to control the loop, and to then select the record based on an IDENTITY column in your table variable (i.e. select x from table variable where ID = counter variable). Do what you need to do and then increment the counter so in effect you get a move next.
In practice this means you have a copy of the data and aren't getting rid of any, just in case you need to do something else with it later.
In our stored procedures working with hundreds, thousands and sometimes millions of records we've found this to be extremely efficient and fast. In one instance a stored procedure dropped from taking several minutes to run down to less than 10 seconds.
December 24, 2008 at 3:18 am
Jeff Moden (12/24/2008)
Good idea, Gaby, and good article...
Could you list one thing that's a good idea in this article, spelling mistakes aside?
December 24, 2008 at 4:26 am
Hello all,
I agree with Matt since a select count(*) is not quickly handled operation if you do not have indexes on a huge table. So if the source table we aim to use cursors on is really big, I believe Cursor method will handle the situation better.
I aggree with Darren also because he eliminates the select count(*).
This method can be used I believe by making a few modifications as I have copied down the sql codes.
But I could not escape from deleting rows using column values
declare @i int;
set @i = 0;
while 1=1
begin
select top 1 @id2 = ID from @muchWorseProblem
if @id2 is null
break;
select @query = 'declare @i int; set @i = ' + convert(varchar,@id2)
exec(@query)
delete from @muchWorseProblem where id = @id2
set @id2 = null
end
December 24, 2008 at 4:35 am
Having modified that script a bit, and run it on my server - the time was 23.078 secs. Improved, certainly, but not better.
SET NOCOUNT ON
declare @query varchar(100), @id int
DECLARE @muchWorseProblem TABLE (ID int primary key clustered)
INSERT INTO @muchWorseProblem (ID)
select top 100000 ID from [tblLogTransactions]
declare @id2 int;
declare @i int;
set @i = 0;
while 1=1
begin
select top 1 @id2 = ID from @muchWorseProblem
if @id2 is null
break;
select @query = 'declare @i int; set @i = ' + convert(varchar,@id2)
exec(@query)
delete from @muchWorseProblem where id = @id2
set @id2 = null
end
The quickest I could get it to go was the following, which does come in slightly quicker than a cursor, at 10.703 secs:
SET NOCOUNT ON
declare @query varchar(100), @id int, @rowNum int
DECLARE @muchWorseProblem TABLE (RowNum int IDENTITY(1, 1) primary key clustered, ID int)
INSERT INTO @muchWorseProblem (ID)
select top 100000 ID from [tblLogTransactions]
declare @maxRowNum int
SELECT @maxRowNum = MAX(RowNum) FROM @muchWorseProblem
declare @i int;
set @i = 1;
while @i <= @maxRowNum
begin
select @id = ID from @muchWorseProblem WHERE rowNum = @i
select @query = 'declare @i int; set @i = ' + convert(varchar,@id)
exec(@query)
set @i = @i + 1
end
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
December 24, 2008 at 5:39 am
I've always resisted responding to articles with obvious errors simply not wanting to get into arguments about which way to skin the cat. I hope people who read the original articles here also read all the comments and replies. I hear and see too many junior (and some not so junior) read articles then start using the techniques as THE way to do something.
I can spot several things I would change with a lot of the reply scripts. I won't list those right now because I think the most important thing for readers who visit here to know is that they MUST try different techniques and record the results so that they know which technique solves their problem best for them. They must decide on what trade-offs they can risk and which they can't.
Whenever I write an SP, USP, script, or even a TRIGGER, I comment exactly what I did and also comment exactly what alternate methods I used and why I chose the solution I did (for instance, why I chose a table variable over a temp table (or vice versa) or why I chose a cursor over a while loop). If you have several developers or DBAs who could be maintaining the code, this will help them understand your methods and may save them some time down the road.
December 24, 2008 at 5:44 am
If you can spot several things you would change, then please let us know. It's not about having an argument, it's about learning, and if you have a good method, then please do share it! 😀
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
December 24, 2008 at 5:52 am
I'm a bit puzzled by this article. Surely, in order to illustrate the dangers of using cursors or 'WHILE' loops, you would expect to see an example where there is an obvious performance or resource advantage in avoiding cursors. This routine is so trivial that it could be, and has been, written any number of ways without affecting its execution time at all. The trick, surely, is to take a problem that seems impossible to do without cursors and show how easy it actually is. Give the performance figures for a whole range of data sizes etc.
I'm perfectly happy to use WHILE loops for a job like the one described in this article, especially if it makes the routine more understandable. Am I wrong to do so?
Best wishes,
Phil Factor
December 24, 2008 at 6:13 am
I am very happy that this article was written. However, the speed increases were compared to each other by using percentage of time relative to the batch, which is misleading. For example, 69% of 1 second is a faster process than 31% of 4 seconds.
Viewing 15 posts - 1 through 15 (of 272 total)
You must be logged in to reply to this topic. Login to reply