Cursors Be Gone!

  • Comments posted to this topic are about the item Cursors Be Gone!

    Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein

  • Good idea, Gaby, and good article...

    Just a couple of tips... if you simply add Forward_Only and Read_Only to the cursor declaration to make it a "fire hose" cursor, you've basically done the same thing with a lot less work.

    Also, don't trust the % of batch in any execution plan... it lies... sometimes, it lies a lot! For example... in the following code, both the estimated and actual execution plans say that the first query in the following code will take 0% of the batch and the 2nd query will take 100% of the batch even though they do the same thing. AND, when you run the code just the opposite is true if you look at the output in the message tab... the 2nd query blows the doors off the first query.

    SET NOCOUNT ON

    --=======================================================================================

    -- Recursive method shown by (Name with-held)

    --=======================================================================================

    PRINT '========== Recursive method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @DateVal DATETIME

    SET @DateVal = '2008-01-01'

    ;with mycte as

    (

    select @DateVal AS DateVal

    union all

    select DateVal + 1

    from mycte

    where DateVal + 1 < DATEADD(yy, 5, @DateVal)

    )

    select @Bitbucket = d.dateval

    from mycte d

    OPTION (MAXRECURSION 0)

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    GO

    --=======================================================================================

    -- Tally table method by Jeff Moden

    --=======================================================================================

    PRINT '========== Tally table method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @StartDate AS DATETIME

    SET @StartDate = '2008-01-01'

    SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,5,@StartDate)))

    @Bitbucket = @StartDate-1+t.N

    FROM Tally t

    ORDER BY N

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    The reason why this occurs is because the first query actually contains RBAR in the form of a Recursive CTE and only the first "loop" is measured. Other anomolies also adversly affect the % of batch listings in execution plans.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Almost forgot... don't forget that sp_MsForEachDB is nothing but a big ol' cursor. Look at the code if you don't believe me. Also, in SQL Server 2005, you don't even need a While loop for this type of stuff anymore... here's the same thing using a "pseudo-cursor" to build all of the SQL for all of the databases all at once...

    DECLARE @sql NVARCHAR(MAX)

    SELECT @sql = COALESCE(@SQL+CHAR(10),'') + 'DBCC CheckDB(' + QUOTENAME(Name) + ')'

    FROM Master.sys.Databases

    WHERE Database_ID > 4

    PRINT @sql

    EXEC (@SQL)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As I wrote on an email sent to Gaby you also have to be careful cause sometimes (absolutely random) the variable of type table is not being dumped from the memory by SQL Server. If you use a query in a web page to compute some amounts base on real time results and you hit refresh several times you might see that after a while the page is not updating with the good results. To get rid of this kind of problem I have used a temporary table instead of a variable of type table.

  • Good idea and nice article. Thank you

  • A bit of an odd example to pick. All that effort to avoid using a cursor on an example that executes dbcc checkdb for each of just a few rows. Surely the time it takes time to parse a cursor in this example is totally insignificant when compared to the time taken to do a 'dbcc checkdb' for each database?

    Also, when you say "Imagine scaling the cursor to tables with millions of rows". I'm imagining it! Are you sure that inserting the rows into a temporary table with no index and deleting each row, in a loop, on an individual basis, and selecting the count(*) from the table for each row would actually be quicker than using a cursor to just scan through a recordset? :crazy:

  • As you can see in the image below, while the actual allocation/deallocation of the cursor doesn't have much of an impact, the inefficient way the query is executed is what causes this to be slow. Imagine scaling the cursor to tables with millions of rows.

    The thought of scaling either RBAR to millions of rows fills my heart with sorrow.

    But, making me even more sad is the fact that yes, we've replaced a cursor, but actually with a method that's even more inefficient. Say you did have 1,000,000 rows, for the cursor you would have the read of the data and cursor through it. For the other way, you would have the creation of the table (be it temporary or variable), the insert into it, and 2,000,000 selects and 1,000,000 deletes.

    Try the following three queries - one of which is a cursor, one of which uses this method with a temp table, and one of which uses this method with a table variable. The query works on 100,000 rows and executes reasonably meaningless code so that we're testing the performance of the iteration method, rather than the performance of what is being executed.

    Note also that in the temp table / table variable methods I have added a clustering PK to increase the performance of these methods - i got bored of waiting for the queries to finish without!

    On my server, the cursor takes 12.562 seconds to execute, the temp table method takes 33.093 seconds to execute, and the table variable method takes 32.218 seconds to exeucte.

    Cursor method:

    SET NOCOUNT ON

    declare @query varchar(100), @id int

    declare BadCursor Cursor for

    select top 100000 ID from [tblLogTransactions]

    open BadCursor

    fetch next from BadCursor into @id

    while @@fetch_status = 0

    begin

    select @query = 'declare @i int; set @i = ' + convert(varchar,@id)

    exec(@query)

    fetch next from BadCursor into @id

    end

    close BadCursor

    deallocate BadCursor

    go

    Temp Table method:

    SET NOCOUNT ON

    declare @query varchar(100), @id int

    CREATE TABLE #muchWorseProblem (ID int primary key clustered)

    INSERT INTO #muchWorseProblem (ID)

    select top 100000 ID from [tblLogTransactions]

    while (select count(*) from #muchWorseProblem) > 0

    begin

    select top 1 @id = ID from #muchWorseProblem

    select @query = 'declare @i int; set @i = ' + convert(varchar,@id)

    exec(@query)

    delete from #muchWorseProblem where id = @id

    end

    Table Variable method:

    SET NOCOUNT ON

    declare @query varchar(100), @id int

    DECLARE @muchWorseProblem TABLE (ID int primary key clustered)

    INSERT INTO @muchWorseProblem (ID)

    select top 100000 ID from [tblLogTransactions]

    while (select count(*) from @muchWorseProblem) > 0

    begin

    select top 1 @id = ID from @muchWorseProblem

    select @query = 'declare @i int; set @i = ' + convert(varchar,@id)

    exec(@query)

    delete from @muchWorseProblem where id = @id

    end

    :blink:

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • We've largely got rid of cursors, and have been using table variables for ages.

    Only thing in the article that I'd change would be to ditch the Select Count and the delete and use a variable with @@rowcount (by doing a select x from table variable just before setting the variable)and a counter variable to control the loop, and to then select the record based on an IDENTITY column in your table variable (i.e. select x from table variable where ID = counter variable). Do what you need to do and then increment the counter so in effect you get a move next.

    In practice this means you have a copy of the data and aren't getting rid of any, just in case you need to do something else with it later.

    In our stored procedures working with hundreds, thousands and sometimes millions of records we've found this to be extremely efficient and fast. In one instance a stored procedure dropped from taking several minutes to run down to less than 10 seconds.

  • Jeff Moden (12/24/2008)


    Good idea, Gaby, and good article...

    Could you list one thing that's a good idea in this article, spelling mistakes aside?

  • Hello all,

    I agree with Matt since a select count(*) is not quickly handled operation if you do not have indexes on a huge table. So if the source table we aim to use cursors on is really big, I believe Cursor method will handle the situation better.

    I aggree with Darren also because he eliminates the select count(*).

    This method can be used I believe by making a few modifications as I have copied down the sql codes.

    But I could not escape from deleting rows using column values

    declare @i int;

    set @i = 0;

    while 1=1

    begin

    select top 1 @id2 = ID from @muchWorseProblem

    if @id2 is null

    break;

    select @query = 'declare @i int; set @i = ' + convert(varchar,@id2)

    exec(@query)

    delete from @muchWorseProblem where id = @id2

    set @id2 = null

    end

  • Having modified that script a bit, and run it on my server - the time was 23.078 secs. Improved, certainly, but not better.

    SET NOCOUNT ON

    declare @query varchar(100), @id int

    DECLARE @muchWorseProblem TABLE (ID int primary key clustered)

    INSERT INTO @muchWorseProblem (ID)

    select top 100000 ID from [tblLogTransactions]

    declare @id2 int;

    declare @i int;

    set @i = 0;

    while 1=1

    begin

    select top 1 @id2 = ID from @muchWorseProblem

    if @id2 is null

    break;

    select @query = 'declare @i int; set @i = ' + convert(varchar,@id2)

    exec(@query)

    delete from @muchWorseProblem where id = @id2

    set @id2 = null

    end

    The quickest I could get it to go was the following, which does come in slightly quicker than a cursor, at 10.703 secs:

    SET NOCOUNT ON

    declare @query varchar(100), @id int, @rowNum int

    DECLARE @muchWorseProblem TABLE (RowNum int IDENTITY(1, 1) primary key clustered, ID int)

    INSERT INTO @muchWorseProblem (ID)

    select top 100000 ID from [tblLogTransactions]

    declare @maxRowNum int

    SELECT @maxRowNum = MAX(RowNum) FROM @muchWorseProblem

    declare @i int;

    set @i = 1;

    while @i <= @maxRowNum

    begin

    select @id = ID from @muchWorseProblem WHERE rowNum = @i

    select @query = 'declare @i int; set @i = ' + convert(varchar,@id)

    exec(@query)

    set @i = @i + 1

    end

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • I've always resisted responding to articles with obvious errors simply not wanting to get into arguments about which way to skin the cat. I hope people who read the original articles here also read all the comments and replies. I hear and see too many junior (and some not so junior) read articles then start using the techniques as THE way to do something.

    I can spot several things I would change with a lot of the reply scripts. I won't list those right now because I think the most important thing for readers who visit here to know is that they MUST try different techniques and record the results so that they know which technique solves their problem best for them. They must decide on what trade-offs they can risk and which they can't.

    Whenever I write an SP, USP, script, or even a TRIGGER, I comment exactly what I did and also comment exactly what alternate methods I used and why I chose the solution I did (for instance, why I chose a table variable over a temp table (or vice versa) or why I chose a cursor over a while loop). If you have several developers or DBAs who could be maintaining the code, this will help them understand your methods and may save them some time down the road.

  • If you can spot several things you would change, then please let us know. It's not about having an argument, it's about learning, and if you have a good method, then please do share it! 😀

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • I'm a bit puzzled by this article. Surely, in order to illustrate the dangers of using cursors or 'WHILE' loops, you would expect to see an example where there is an obvious performance or resource advantage in avoiding cursors. This routine is so trivial that it could be, and has been, written any number of ways without affecting its execution time at all. The trick, surely, is to take a problem that seems impossible to do without cursors and show how easy it actually is. Give the performance figures for a whole range of data sizes etc.

    I'm perfectly happy to use WHILE loops for a job like the one described in this article, especially if it makes the routine more understandable. Am I wrong to do so?

    Best wishes,
    Phil Factor

  • I am very happy that this article was written. However, the speed increases were compared to each other by using percentage of time relative to the batch, which is misleading. For example, 69% of 1 second is a faster process than 31% of 4 seconds.

Viewing 15 posts - 1 through 15 (of 272 total)

You must be logged in to reply to this topic. Login to reply