Eliminating Cursors

  • Agreed with Steve. This method may work for the author but this is not the one I would choose and recommend. This is the last resort if nothing else can be improved or if you have only 5 minutes for improvement. Hardware solution for performance improvement is very costly and dangerous because the next step will be to purchase more and more powerful server instead of to improve the bad code. Based on my experience, 90% of the issues with performance are the bad code or bad architectural design. In this case, instead of changing a cursor with temp table I would look for the solution to take away looping construction completely and produce the set based code based on the date rather than based on the order. In most cases it will give you not 75% but 2 to 10 time’s improvement. Loop operation should be the last to consider especially with rows over certain number (let say 200-300 and may be even lower) or when the number of rows is unpredictable. But, in some cases the looping is the only choice! And the last memo: Performance is not only a function of elapsed time a process (stored procedure) is taking but in a lot of cases the way a process interacts with the other processes in a multi-user environment. Especially if we are talking about batch processes, data load processes and reports

    Leo Peysakhovich

  • this is indeed information for me and little bit of Googling produced this

    this

    Many thanks for adding to my knowledge

  • Why not use xp_execresultset

    E.g.,

    EXEC master..xp_execresultset N'SELECT

    ''EXEC dbo.my_procedure '' + CAST(param_1 AS VARCHAR) + '','' + CAST(param_2 AS VARCHAR)''

    FROM (SELECT param_1, param_2 FROM MyDB..MyTable) temp

    ',N'MyDB'

    This keeps everything set-oriented, and should give you much better preformance than a cursor or while loop.  In addition, the code is less verbose.

     

  • I agree w/Mr Hicklin, 75% performance improvement is highly dubious. 

    I would understand it if performance was perhaps 75% WORSE, though.  When inserting data into a table variable, SQL will not use parallelism.  I'm not aware of any other situation where this restriction occurs.  So if query that populates the cursor/tablevar is large/complex enough to benefit substantially from parallelism then the author's suggested approach will perform worse than the cursor. 

    Using true #temp table instead of tablevar is workaround for that...then I'd expect to see same performance for both cursor & fake cursor approaches.

    I see no benefit to this approach at all.

  • On a different note, how do you approach eliminating a cursor for "batch" inserting correlated items and the parent item uses an identity as PK (for example, inserting a batch of orders and the corresponding orderItems)?

    I have two approaches, but neither is very ellegant:

    • Reserve some IDs on the table and enable "identity insert" to insert the calculated IDs and the child items;
    • Add another column to the table to insert an ID with which I can relate the order and order items

    Another approach is with the help of triggers, but I'm not very fond of using triggers

  • Yeah, thanks for that, cleared a few things up for me too.....it's amazing what can be learnt by a simple little question..

    Cheers

    Bob

  • Temp tables will suffer from the same performance problems as cursors. Temp tables increase concurrency problems. In older versions of SQL Server, the use of temp tables causes horrible problems due to locking of tempdb. The temp table method is not that great... I will post a good solution to this problem later today...

    Jasmine

  • Surprisingly enough, when doing mass UPDATES or DELETES on very large tables, I've found that using a WHILE loop to limit the number of rows affected at one time gives a performance increase.  I suspect it's mostly due to logging.  As an example, I recently had to perform a massive, selective DELETE on a table with 33 million records:

    DELETE FROM myTable

    WHERE LastName = ''

    What I found was that the DELETE took an extremely long time on my (admittedly) woefully underpowered development machine.  So I added a WHILE loop to break the delete up into chunks:

    DECLARE @start INT

    SET @start = 0

    DECLARE @end INT

    SET @end = 33000000

    WHILE @start <= @end

    BEGIN

        DELETE FROM myTable

        WHERE rowID > @start  AND rowID <= @start + 1000000

        AND LastName = ''

        SET @start = @start + 1000000

    END

    I agree that trading in a cursor for a WHILE loop row-by-row iterator probably doesn't provide as much benefit over a CURSOR than a good set-based solution will.  And I find the author's results to be suspect, although I'll have to wait until I get home to test them.

  • SQL is set-based.  Everything is done in sets.  Almost anything (with very, very, very few exceptions) that you can do with CURSORS can be done with set-based SQL commands + a little imagination.  SQL is a "declarative" language; i.e., you tell SQL what you want and the SQL engine finds the best way to achieve the desired results.  This can be a strange notion for programmers coming from "imperative" languages like C++, VB, etc.

    One of the best comparisons I've seen is the "cab driver" analogy.  The "imperative" method is analogous to jumping into a cab and giving the cab driver detailed directions to get you to the airport ("turn left at the next light", "turn right at the corner", "stop at this light", "get off at this exit", etc.)  You achieve your desired result by telling the cab driver exactly how to accomplish each and every step along the way.

    With the "declarative" method you trust that the cab driver knows how to get to the airport.  You tell the cab driver "take me to the airport", and assume he will find the best route.  You assume the cab driver knows many things that you might not be aware of (i.e., traffic patterns at this time of day, recent accidents, weather, shortcuts, etc.)  In SQL, you define your end results and the SQL engine determines the best path from point A to point Z; the SQL engine also takes a lot of factors into account which you may or may not know of, or care about.

    When you use CURSORS, you're stepping back into an imperative mindset.  You are telling the SQL engine how to do its job, one row at a time.  This kills a lot of the performance benefit you would get by letting the SQL Server arrange and optimize the details of your request for you.

  • Exactly right.  Trusting a TABLE variable (as opposed to a Temp. Table, or any other table for that matter) not to use TEMPDB is a fallacy.  There is no specification that TABLE variables do *not* use TEMPDB.  Belief that TABLE variables will not affect TEMPDB is not a valid assumption.

  • I have seen this approach many times and I must say that I hate it.  Avoiding cursors is a good thing, but this approach should be called "roll your own cursor."  It isn't cursors that should be avoided so much as it is row-at-a-time operations.  If there is no alternative (and sometimes there isn't) use a cursor for heaven's sake!  Particularly when it comes to large record sets a firehose cursor will outperform the temp table/while loop trick, unless you need an index on the temp table, but even then if you have to loop through the records, just use a cursor.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • As others have mentioned, STATIC or LOCAL STATIC cursors resolve much of the performance issues seen with cursors.

    I did once see an indication that non-STATIC cursors behave in a significantly non-optimal way.  I was running the same basic routine with slightly different selection criteria, and using a non-static cursor.  The table I was running against held several million rows; my first selection criteria should have covered many more rows than my second.  However, both were taking a very long time.

    When I investigated the actual numbers of records returned, the first selection criteria returned something like 15,000 records.  The second returned 4.  Yes, just 4.  And it was taking hours to do.  This has lead me to the belief that non-static cursors do something very peculiar, and their performance is tied more to the size of the tables they're using than the size of the result set.

    I use LOCAL STATIC cursors whenever I need to use cursors.

    Also:  sometimes, readability and understandability are more important than outright speed in a query.  When others will need to work with it far longer than you, or it will be used infrequently, and will need modification every time it's used, using a method that will be clearer to others to do the job can make it much easier to maintain and use in the future.


    R David Francis

  • I have published this technique in May 2005 (check http://www.sqlservercentral.com/columnists/aGrinberg/thearrayinsqlserver2000.asp). There is a problem with the TABLE variable. If you have a big row set, the TABLE variable will slow down performace. In this instance, a temporary table is better choice, but if the row set is very big (more then 100,000) I would recomend creating a table with a clustered index on the column with the identity property (in this case, this is going to be row pointer), usually the first column. If you have a chance - try it and you will see the difference in the speed.

  • While 1 = 1

    BEGIN

    Select top 1

    @var = var,

    @othervar = othervar

    from table

    where var > @var

    order by var

    if @@rowcount = 0 break

    print 'do something to ' + @othervar

    END

    That's as stripped down as it gets for a loop. Now isn't that easier to read than cursor code? I realize that most of the time loops aren't needed in SQL, but sometimes they are, and it should be super simple and clear. Imagine making a java or c# programmer write all that cursor code to do a loop; they would laugh in our face.

    Signature is NULL

  • I was working on a conversion project that required processing each record in several a tables.  The process had to start and finish inside of 48 hours.  The overhead of cursors took too much time and was memory intensive.  Performance increased dramatically when using the approach listed here.  The query plan is used again and again.  The key colum should be indexed naturally for lightning fast results.

    declare @id int     --key column identifier

    declare @rows int    --rows returned by the select statement

    set @id= -1     --counter is set to start one position less than smallest id

    set @rows =1     --counter is set to 1 so that the query will run

    while  @rows > 0

     BEGIN

      select top 1

       @id=orderid  --assign value to @id

      from

       dbo.[Order Details]

      where

       @id<orderid  --make sure that @id is less than column identifier

      order by

       orderid   --ensure that identifiers are processed in order

      

      set @rows=@@ROWCOUNT  --*****IMPORTANT!!!!!!THIS LINE HAS TO BE THAT FIRST LINE AFTER THE SELECT STATEMENT********

      

      if @rows > 0

       begin

        print @id --Execute stored procedure here to process @id

       END

     END

     

    Calvin iI'm with you bro

Viewing 15 posts - 31 through 45 (of 296 total)

You must be logged in to reply to this topic. Login to reply