Cursors Be Gone!

  • Jeff

    You know what, I have to say - would it kill you to agree with what people have said rather than repeat what other people say and try to pass it off as wisdom?

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Just for fun if you had a banking application with millions of account records in a table and had to calculate some complex interest every day on the balance based upon a account holder billing date which would be better, cursor or set based operation?

    in both cases you have 3 columns you need to check & update , balance, date and interest

    update set balance = CalcNewBal(balance,interestRate) where billingDate = today

    or

    declare int cursor for select * from acounts where billingDate = today

    open int

    fetch ....

    while

    begin

    update set balance = CalNewBal(balance,interestrate) where current of int

    fetch

    end

    close

    If you add just a little bit of bussiness compelxity with the column, it becomes almost impossible to code in a update statement in a set based operation.

    I don't know which of the above will be faster - i would guess the set based, but maybe you be locking out the db from customer withdraws longer or ... as an earlier post said - it just depends upon your application

  • The other restrictive issue is operations such as BCP - which I use a lot as I operate in a widely distributed environment where trusted server setup has not been introduced (vendor who owns the software is a bit inexperienced when it comes to SQL Server). Easiest way to retrieve data is therefore to use a cursor to BCP data out of the target systems before BCPing it back into my local system. And you can't use BCP against a local temp table. I'd have to create and destroy tables with my list data as required - this is also a solution, but not great for auditing and security purposes.

    I try to avoid cursors whenever I can, but especially for FAST_FORWARD sometimes they're just too darn convenient. A guilty pleasure like chocolate 🙂

    Have a safe festive season everyone!

  • Haroon Said (12/24/2008)


    Just for fun if you had a banking application with millions of account records in a table and had to calculate some complex interest every day on the balance based upon a account holder billing date which would be better, cursor or set based operation?

    in both cases you have 3 columns you need to check & update , balance, date and interest

    update set balance = CalcNewBal(balance,interestRate) where billingDate = today

    or

    declare int cursor for select * from acounts where billingDate = today

    open int

    fetch ....

    while

    begin

    update set balance = CalNewBal(balance,interestrate) where current of int

    fetch

    end

    close

    If you add just a little bit of bussiness compelxity with the column, it becomes almost impossible to code in a update statement in a set based operation.

    I don't know which of the above will be faster - i would guess the set based, but maybe you be locking out the db from customer withdraws longer or ... as an earlier post said - it just depends upon your application

    In the situation where you can write one update to do the job in a statement and there are millions of records it would probably be orders of magnitude faster as a single update statement as opposed to a cursor.

  • Matt Whitfield (12/24/2008)


    @jeff -

    I tried to test the COALESCE method, but it was horrifically slow over about 10,000 rows. Also interested why you say SELECT COALESCE(@SQL .... when it's actually a lot quicker to just set @sql to '' first?

    Absolutely... I agree... COALESCE is the worst (more on that in a minute). As to why I used it... habit... a really bad one, left over from the days of SQL Server 2000. Pseudo-cursor methods similar to the following are much, much faster...

    DECLARE @sql VARCHAR(MAX)

    SELECT @sql = (SELECT 'DBCC CheckDB(' + QUOTENAME(Name) + ')' + CHAR(10)

    FROM Master.sys.Databases

    WHERE Database_ID > 4

    FOR XML PATH(''))

    PRINT @sql

    Of course, that won't work in SQL Server 2000 and most people revert back to the COALESCE Method or the straight method you suggested. In that case, ISNULL is a better way to go than COALESCE because it's faster and is almost as fast as the straight method.... if gives the added pleasure of driving the ANSI ONLY folks absolutely bonkers. 😛 That's why I normally post the COALESCE solution... keeps me from having to explain over and over that I don't care about ANSI compliant SQL because, until it catches up with reality and everyone follows it to a "T", code portability is just a myth.

    Here's a bit of test code I use to create test tables for this type of stuff...

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1

    CROSS JOIN Master.dbo.SysColumns t2

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    ... and here're some performance tests that use that code to demonstrate what Matt is talking about...

    PRINT '===== Matt''s cited method ====='

    DECLARE @SomeString VARCHAR(MAX)

    SET @SomeString = ''

    SET STATISTICS TIME ON

    SELECT @SomeString = @SomeString + SomeCsv + CHAR(10)

    FROM dbo.JBMTest

    WHERE RowNum <= 10000

    SET STATISTICS TIME OFF

    PRINT STR(LEN(@SomeString)) + ' Characters'

    PRINT REPLICATE ('=',78)

    GO

    -----------------------------------------------------------------------------

    PRINT '===== Coalesce method ====='

    DECLARE @SomeString VARCHAR(MAX)

    SET STATISTICS TIME ON

    SELECT @SomeString = COALESCE(@SomeString + CHAR(10),'') + SomeCsv

    FROM dbo.JBMTest

    WHERE RowNum <= 10000

    SET STATISTICS TIME OFF

    PRINT STR(LEN(@SomeString)) + ' Characters'

    PRINT REPLICATE ('=',78)

    GO

    -----------------------------------------------------------------------------

    PRINT '===== IsNull method ====='

    DECLARE @SomeString VARCHAR(MAX)

    SET STATISTICS TIME ON

    SELECT @SomeString = ISNULL(@SomeString + CHAR(10),'') + SomeCsv

    FROM dbo.JBMTest

    WHERE RowNum <= 10000

    SET STATISTICS TIME OFF

    PRINT STR(LEN(@SomeString)) + ' Characters'

    PRINT REPLICATE ('=',78)

    GO

    -----------------------------------------------------------------------------

    PRINT '===== XML path method ====='

    DECLARE @SomeString VARCHAR(MAX)

    SET STATISTICS TIME ON

    SELECT @SomeString = (SELECT SomeCsv + CHAR(10) FROM dbo.JBMTest WHERE RowNum <= 10000 FOR XML PATH(''))

    SET STATISTICS TIME OFF

    PRINT STR(LEN(@SomeString)) + ' Characters'

    PRINT REPLICATE ('=',78)

    GO

    -----------------------------------------------------------------------------

    Here're what I get for results on my 6 year old, single 1.8 Ghz CPU, 1GB ram desktop box...

    [font="Courier New"]===== Matt's cited method =====

    SQL Server Execution Times:

    CPU time = 41406 ms, elapsed time = 69796 ms.

    700000 Characters

    ==============================================================================

    ===== Coalesce method =====

    SQL Server Execution Times:

    CPU time = 89078 ms, elapsed time = 106410 ms.

    699999 Characters

    ==============================================================================

    ===== IsNull method =====

    SQL Server Execution Times:

    CPU time = 42531 ms, elapsed time = 68476 ms.

    699999 Characters

    ==============================================================================

    ===== XML path method =====

    SQL Server Execution Times:

    CPU time = 62 ms, elapsed time = 794 ms.

    700000 Characters

    ==============================================================================

    [/font]

    The XML Path method for repetative concatenation is just nasty fast...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Matt Whitfield (12/24/2008)


    Jeff

    You know what, I have to say - would it kill you to agree with what people have said rather than repeat what other people say and try to pass it off as wisdom?

    Matt... I did agree and said so. There's also a bit to be added in retrospect. That's what discussions are about.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Matt's comment notwithstanding, I still think Jeff has a point. If he were preaching to the converted, it would be a tired old point, but far too many developers still default to RBAR thinking instead of comparing set-based vs procedural solutions based on a thorough understanding of both.

    Having said that, an interesting conversation pattern is running as a background thread here:

    - Tool A (e.g. cursors) gets bashed

    - Someone reveals the secret passphrase (e.g. FAST_FORWARD) to make tool A "perform properly"

    - This gets bashed, and the cycle repeats

    It kind of reminds me of the mythical "go-faster" SQL Server startup parameter; its use by "experts" would make it simultaneously a spell and a shibboleth. Face it, much default behaviour in SQL Server is default for a good reason, and when you feel the urge to override things, there should be an accompanying urge to research just what the heck you're about to do. There's a good chance you'll be making a good choice, but if you can't even enumerate the other options, let alone analyse them, how sure can you be?

    But back to the Cursor (tm) vs Gray-market Cursor vs Set-based argument. The million-row use case seems to be the interesting one here, since it contains the biggest potential to bring a system to its knees if the wrong decision is made. Here are a few thoughts...

    At some data set size, you'll have to worry about what you are storing temporarily, and where. The "where" could be the server memory and/or tempdb, memory and/or disk space on the middle tier etc. Yes, the problem can be shuffled between machines, but the carpet remains lumpy. What one would presumably like to do is to minimise the amount of data persisted, for example by storing only a set of retrieved keys instead of detailed data, and then retrieving only one row of detailed data at a time for further processing. This persisted data could take the shape of a keyset cursor, a hand-rolled keyset on the server (table variable or temp table) or a dataset elsewhere.

    Alternatively, you could hand things over to SQL Server and let it loose on a set-based solution. So will this be better? As usual, it depends, specifically on your database and query design, and on things like the concurrency situation at the time you're running it. Even if all the data retrieval for a million-row UPDATE could theoretically be done blindingly fast, you are statistically pretty likely to hit a blocking scenario if this is a main table in an OLTP environment during business hours, making a carpet-bombing update a bad choice in this scenario (yes, the old carpet again). Of course, row-by-row processing within a single transaction will land you with the exact same problem, exacerbated by being done more slowly.

    But assuming that a set-based update is feasible, how do you compare apples and apples? As (I believe) Jeff has eloquently pointed out elsewhere, the internals of that query may still involve some form of row-by-row processing, and a large amount of data may still need to be cached temporarily during query processing. However, an ideal database and query design can sidestep that by avoiding memory-hungry and blocking operators. And guess what? A nice clean loop-join query plan used in a set-based solution will still be just as nice and clean when supporting a dynamic cursor!

    I'm not trying to muddy the water here, just pointing out that there may be times when the cursor vs set-based argument could be a case of fiddling while Rome burns, inasmuch as both solutions may end up having their costs dominated by the same things. Yes, the cursor will ALWAYS do deeper stack traversals, but that is about the only constant.

  • Excellent article ...:)

  • Hi Gaby,

    Thanks for the Article. It is highly helpful for our perspect. Atleast this query will be of great help for those SPs/UDFs that are badly hit by excessive usage of the cursor. In this context I have just one more addition to your query. Rather than deleting the data from the Table variable can it be possible to select the database by the virtue of its identity value assigned to it as I have mentioned in the following query :

    declare @query varchar(100), @count int, @counter int, @dbname sysname

    declare @dblist table (dbid int IDENTITY, dbname sysname)

    insert into @dblist

    select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')

    select @count = count(*) from @dblist

    select @counter = 1

    while @counter <= @count

    begin

    select @dbname = dbname from @dblist where @counter = dbid

    select @query = 'dbcc checkdb(' + quotename(@dbname) + ')'

    exec(@query)

    Set @counter = @counter + 1

    end

    go

  • Arindam Ganguly (12/25/2008)


    Hi Gaby,

    Thanks for the Article. It is highly helpful for our perspect. Atleast this query will be of great help for those SPs/UDFs that are badly hit by excessive usage of the cursor. In this context I have just one more addition to your query. Rather than deleting the data from the Table variable can it be possible to select the database by the virtue of its identity value assigned to it as I have mentioned in the following query :

    declare @query varchar(100), @count int, @counter int, @dbname sysname

    declare @dblist table (dbid int IDENTITY, dbname sysname)

    insert into @dblist

    select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')

    select @count = count(*) from @dblist

    select @counter = 1

    while @counter <= @count

    begin

    select @dbname = dbname from @dblist where @counter = dbid

    select @query = 'dbcc checkdb(' + quotename(@dbname) + ')'

    exec(@query)

    Set @counter = @counter + 1

    end

    go

    Minor point, this section:

    select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')

    select @count = count(*) from @dblist

    select @counter = 1

    is more efficiently handled thusly:

    select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')

    select @count = @@rowcount, @counter = 1

    that gets rid of an unnecessary table scan and combines 2 selects into one

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Jeff Moden (12/24/2008)


    Matt... I did agree and said so. There's also a bit to be added in retrospect. That's what discussions are about.

    Fair enough. I re-read it today and read it completely differently. That's working long hours for you 🙂

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • TheSQLGuru (12/26/2008)


    Arindam Ganguly (12/25/2008)


    Hi Gaby,

    Thanks for the Article. It is highly helpful for our perspect. Atleast this query will be of great help for those SPs/UDFs that are badly hit by excessive usage of the cursor. In this context I have just one more addition to your query. Rather than deleting the data from the Table variable can it be possible to select the database by the virtue of its identity value assigned to it as I have mentioned in the following query :

    declare @query varchar(100), @count int, @counter int, @dbname sysname

    declare @dblist table (dbid int IDENTITY, dbname sysname)

    insert into @dblist

    select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')

    select @count = count(*) from @dblist

    select @counter = 1

    while @counter <= @count

    begin

    select @dbname = dbname from @dblist where @counter = dbid

    select @query = 'dbcc checkdb(' + quotename(@dbname) + ')'

    exec(@query)

    Set @counter = @counter + 1

    end

    go

    Minor point, this section:

    select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')

    select @count = count(*) from @dblist

    select @counter = 1

    is more efficiently handled thusly:

    select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')

    select @count = @@rowcount, @counter = 1

    that gets rid of an unnecessary table scan and combines 2 selects into one

    Hi Kevin,

    1. I think whichever loop you use you will not notice any difference, because dbcc will run most of the time.

    2. I you really need a fast loop, have a look at C# - it is likely to run faster.

  • Matt Whitfield (12/26/2008)


    Jeff Moden (12/24/2008)


    Matt... I did agree and said so. There's also a bit to be added in retrospect. That's what discussions are about.

    Fair enough. I re-read it today and read it completely differently. That's working long hours for you 🙂

    Well, heh... better that you took out a little frustration on me than somebody at work. 🙂 Thanks for the come back.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ****

    Hi Kevin,

    1. I think whichever loop you use you will not notice any difference, because dbcc will run most of the time.

    2. I you really need a fast loop, have a look at C# - it is likely to run faster.

    ****

    My correction had nothing what-soever to do with loops or runtime. It was simply correcting poor code. Unless such points are made people cannot improve their ability to write good code. And the fixes are trivial in nature and thus there is no excuse for them once someone has been taught to avoid them - not even the oft mentioned "I don't have time to write good code because it takes too long". 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Gaby A. (12/24/2008)


    dford (12/24/2008)


    Lot of good responses here. I have to say, I don't like seeing lopsided articles about getting rid of cursors. They exist still partially because there are good uses for cursors. There are so many of these types of "Cursors are Evil" articles that SQL noobs see these and kill millions of brain cells trying to avoid using cursors. I do think cursors are over used a bit but I would challenge anyone to accomplish some of the beastly data migrations that I have had to do without a cursor. A balanced article would be more appropriate. Not everyone that reads these articles has years of SQL experience. This site shows up well on search results and lots of new SQL developers and DBA's show up here for answers. My rant on onesided articles is done now.:D

    My apologies if it seemed lopsided. So many scripts and articles on this and other web sites are ubiquitously populated with cursors, I thought about giving an opposite approach.

    A couple things I've learned:

    1. First time article, I'll be prepared next time with more thorough examples.

    2. People get the most passionate about those things you expect to elicit the least amount of emotion.

    Have a good holiday folks, and a happy new year.

    Don't let the ******** get you down. It was a good first article. Do run the spell check on the next one though.

    I think one of the best bits of feedback you've received in these responses was Jeff's to be very wary of using the estimated cost in execution plans as a measure of performance. It's especially iffy when comparing two completely different execution plans, one to another. You have to include reads, CPU and actual execution time for a complete picture.

    Keep the articles coming.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 46 through 60 (of 272 total)

You must be logged in to reply to this topic. Login to reply