There Must Be 15 Ways To Lose Your Cursors... part 1, Introduction

  • katedgrt (4/15/2009)


    RBarryYoung (4/14/2009)


    ... (Note: Wikipedia is using revisionist terminology that would instead call this "imperative programming", but I like the old terms better).

    I haven't heard the term revisionist terminology before but I have a need for it, and I dig your use of it here. It applies in so many situations these days, my resume included. Although I was in the Data Integration department at the Big Bank for years, my resume now reflects that work as ETL. Same concept, different century.

    Well, now I have to confess, my use of "revisionist" is probably a little bit too strong. :blush: The problem is the only other description I could think of ("latter-day" terminology) is not nearly strong enough.

    You see, "revisionist" implies an attempt to rewrite history, and that is not what really happened AFAIK, though there was (and is) some ulterior motives in re-casting traditional terms. Usually it's that someone(s) either wants to reuse the better, older words for their own initiatives or theories or they want try to recast peoples understanding of the distinction between different "classes" of things.

    In this case, I believe that the OO (object-oriented) proponents of the mid-90's wanted to rebrand the object-oriented sub-paradigm as something completely different from the more traditional 3GL's like Fortran, COBOL, etc. Now the distinguishing term between these two groups of Procedural languages (languages that are written in the form of a procedure) is that the newer ones were "Object-Method Oriented" meaning that they created and used methods in objects, whereas the older ones were "Procedure-Oriented" meaning that they created and used procedures without objects (a method is just a procedure in an object). Notice the subtle but real difference between "Procedural" and "Procedure Oriented".

    Now the OO folks had tried to take both 5GL and 6GL as designations earlier, but that never really went anywhere. So instead they decided to demarcate the new languages as "not Procedural languages", though they were actually Procedural languages that were object-oriented. Fast forward a few years and the distinctions of the previous paradigms has been lost and now folks have to use a new term for the larger group of non-declarative languages, so they are now called "imperative". Yuck.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Andy DBA (4/15/2009)


    I apologize if this is somewhat off topic and there's probably tons of articles on it already, but here is a word to the wise on performance testing. I noticed GSquared and others using GetDate() and DateDiff to capture execution times. I call this "wall-clock benchmarking". If you are the only user on your server and/or are taking averages of multiple tests your comparisons may be pretty good, but any other processes running on your server (including the OS!) can throw your results way off :w00t:.

    Someone with good expertise on the guts of SQL Server please feel free to jump in here, but I highly recommend querying master.dbo.sysprocesses with the @@spid system variable to get before and after resource usage values and then taking the average of multiple iterations. (see code for one iteration below) Also, don't forget about execution plan caching. Depending on what you're testing, you may want to throw out your first set of results.

    Here's the sql I suggest using to capture cpu usage and i/o. I think BOL explains exactly what these values mean, but for A/B comparisons on the same machine, the raw values are usually good enough.

    declare @start_cpu int

    declare @end_cpu int

    declare @start_io int

    declare @end_io int

    select @start_cpu = sum(cpu), @start_io = sum(physical_io) from master.dbo.sysprocesses where spid = @@spid

    /* Insert SQL to be performance tested here */

    select @end_cpu = sum(cpu), @end_io = sum(physical_io) from master.dbo.sysprocesses where spid = @@spid

    select @end_cpu - @start_cpu as cpu_used, @end_io - @start_io as io_used

    --Note: aggregation is probably not necessary, but if you're looking at a different spid, sysprocesses can sometimes returns multiple rows.

    I absolutely agree... Delta-T's using GETDATE() are only a surface indication that something may be going right or wrong. The reason why many use it on this forum is because it's so simple. I'll frequently use SET STATISTICS IO ON an SET STATISTICS TIME ON because they actually show more information than what a dip into SysProcesses does. Of course, on any looping code, those are just totally ineffective. I'm slowly but surely getting into the habit of having Profiler running for batch completions (all columns selected) with a filter on the particular SPID I happen to be testing on/with.

    The other thing is that sometimes CPU time isn't enough... it's also very necessary to see what the I/O activity is either from a cache standpoint or actual hard disk access.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thomas (4/15/2009)


    However, I have, in my travels, run into situations where even though a set-based solution existed, it performed worse than a cursor in that particular version of the DBMS for that particular problem. I suppose it is akin to denormalizing. You have to know the reasons for normalizing and be versed in its use before you can consciously decide to deviate for a particular solution.

    Absolutely correct... except that most of those "set-based" solutions that perform worse than a cursor aren't actually set-based. They just look like it. Just because it has no While Loop or explicit RBAR, doesn't mean it's set based. Those poor performing solutions are giving "good" set-based code a very bad reputation. One of the more common and "deadly" reasons for such poor performance can be found in the following article...

    http://www.sqlservercentral.com/articles/T-SQL/61539/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ZA_Crafty (4/15/2009)


    gautamsheth2000 (4/13/2009)


    Should use this code 🙂

    Select count(*) From master.sys.columns C1 CROSS JOIN master.sys.columns C2

    You can do even one better.

    using the PK in count is faster than *

    ie,

    Select count(id) From master.sys.columns C1 CROSS JOIN master.sys.columns C2

    Heh... I'm thinking that no one has actually tried the second piece of code. It produces an error.

    Also, while SELECT * can be deadly to performance, it's a myth that counting a PK column is faster than counting using COUNT(*). But don't take my word for it... test it. Since Master.sys.columns is actually a view and not a table, let's build our own million row table and see what happens...

    [font="Courier New"]--===== Create and populate a 1,000,000 row test table.

         -- Column "RowNum" has a range of 1 to 100,000 unique numbers

         -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

         -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

         -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

         -- Column "SomeDate" has a range of  >=01/01/2000 and <01/01/2010 non-unique date/times

         -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

         --        for all rows.

         -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

     SELECT TOP 1000000

            SomeID       = IDENTITY(INT,1,1),

            SomeInt      = ABS(CHECKSUM(NEWID()))%50000+1,

            SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

                         + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

            SomeCSV      = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

            SomeMoney    = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

            SomeDate     = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

            SomeHex12    = RIGHT(NEWID(),12)

       INTO dbo.JBMTest

       FROM Master.dbo.SysColumns t1,

            Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

         -- Takes about 1 second to execute.

      ALTER TABLE dbo.JBMTest

            ADD PRIMARY KEY CLUSTERED (SomeID)[/font]

    And, then lets test the two counts. I recommend running these more than once because both are very fast and casual system activity can make it look like either is superior for any given run...

    [font="Courier New"]    SET STATISTICS TIME ON

     SELECT COUNT(*)      FROM dbo.JBMTest

     SELECT COUNT(SomeIDFROM dbo.JBMTest

        SET STATISTICS TIME OFF[/font]

    If you run that code with the actual execution plan turned on, you'll also see that both pieces of code do a clustered index scan... just in case no one knows what that really means, it's the same thing as a slightly more intelligent table scan because that's where the clustered index lives.

    There is no advantage to counting a given column compared to counting using "*". There may, however, be a huge disadvantage if you pick the wrong column. If you pick the PK column, then COUNT(pkcolumn) and COUNT(*) are functionally equivalent because the PK will not allow nulls. If, however, you pick a column that does allow nulls, the answer can be very different...

    [font="Courier New"] SELECT COUNT(*),COUNT(SomeCSVFROM dbo.JBMTest

     UPDATE dbo.JbmTest

        SET SomeCSV NULL

     SELECT COUNT(*),COUNT(SomeCSVFROM dbo.JBMTest[/font]

    My recommendation is that if you want to use COUNT to determine the total number of rows in a table, then use COUNT(*) and not COUNT(columnname).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • dejan.kelemen (4/15/2009)


    I'm all for declarative and rarely or almost never use while loops or cursors, but how would I go about doing this without while loop...btw I'm writing this in a hurry, so please go easy on me 🙂

    create table ConditionTable

    ...

    Let me know how this works:

    create table ConditionTable(

    ConditionTableKey int identity(1,1) primary key not null,

    Condition1 bit,

    Condition2 bit,

    Condition3 datetime)

    create table DataTable(

    DataTableKey int identity(1,1) primary key not null,

    DataName varchar (20),

    Canceled bit,

    Active bit,

    SysDate datetime)

    insert into ConditionTable (Condition1, Condition2, Condition3 )

    values ( 1, 1, '20090425' )

    insert into ConditionTable (Condition1, Condition2, Condition3 )

    values ( null, 1, '20090426' )

    insert into ConditionTable (Condition1, Condition2, Condition3 )

    values ( 0, 1, '20090427' )

    insert into ConditionTable (Condition1, Condition2, Condition3 )

    values ( 1, 1, '20090427' )

    insert into DataTable( DataName, Canceled, Active, SysDate )

    values ( 'Data1', 1, 1, '20090427' )

    insert into DataTable( DataName, Canceled, Active, SysDate )

    values ( 'Data1', 0, 1, '20090426' )

    insert into DataTable( DataName, Canceled, Active, SysDate )

    values ( 'Data2', 0, 1, '20090424' )

    insert into DataTable( DataName, Canceled, Active, SysDate )

    values ( 'Data1', 0, 1, '20090425' )

    insert into DataTable( DataName, Canceled, Active, SysDate )

    values ( 'Data3', 1, 1, '20090426' )

    insert into DataTable( DataName, Canceled, Active, SysDate )

    values ( 'Data1', 1, 0, '20090427' )

    insert into DataTable( DataName, Canceled, Active, SysDate )

    values ( 'Data3', 1, 0, '20090425' )

    drop table #Result

    create table #Result(

    DataTableKey int,

    DataName varchar (20),

    Canceled bit,

    Active bit,

    SysDate datetime)

    declare @MaxCondition int

    select @MaxCondition = max (ConditionTableKey)

    from ConditionTable

    Select DataTableKey, DataName, Canceled, Active, SysDate

    From (select D.*,

    ROW_NUMBER() over(partition by d.DataName order by DataTableKey) as NameCnt

    from DataTable D

    Left Join ConditionTable C

    ON C.ConditionTableKey <= @MaxCondition

    And (c.Condition1 is null or d.Canceled = c.Condition1)

    and (c.Condition2 is null or d.Active = c.Condition2)

    and (c.Condition3 is null or d.SysDate = c.Condition3)

    ) DD

    Where DD.NameCnt = 1

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • john.arnott (4/15/2009)


    Dejan,

    This is a classic example of how powerful the row_number() function can be. I coded this solution using a CTE out of personal preference, although I suppose that part may have be in a subquery. I also retained the "where.... not in result" clause, although the core query in the CTE doesn't depend on it. One more thing is that the cursor-based code didn't specify the precedence whereby one data row would be preferred over the rest with the same value of dataname, so I guessed at SysDate.

    drop table #Result

    create table #Result

    (

    DataTableKey int,

    DataName varchar (20),

    Canceled bit,

    Active bit,

    SysDate datetime

    )

    ;with cte (datarownum, datatablekey, dataname, canceled, active, sysdate) as

    (

    select datarownum = row_number() over (partition by d.dataname order by d.sysdate)

    ,d.DataTableKey, d.DataName, d.Canceled, d.Active, d.SysDate

    from DataTable d

    join conditiontable c on (d.Canceled = c.condition1 or c.condition1 is null)

    and (d.Active = c.condition2 or c.condition2 is null)

    and (d.SysDate = c.condition3 or c.condition3 is null)

    )

    Insert into #Result

    select datatablekey, dataname, canceled, active, sysdate

    from cte

    where datarownum = 1

    and DataName not in (select DataName from #Result)

    select *

    from #Result

    Oops, John beat me to it! 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (4/15/2009)


    Thomas (4/14/2009)


    A simple example of where set based operations break down is in very large data sets. I want to take a table of 10 million rows and copy it to another table in a production system. The obvious solution would seem to be something like:

    Insert DestTable

    Select ...

    From TableA

    Yet, at the number of rows we are discussing, this could take hours to execute even on the best hardware especially if the source table and the destination table are separated by a slower link.

    This is mentioned in my article. The reason that the set-based solution is a problem is because it attempts to perform the task as quickly as possible. Because even that is still quite a long time, it effectively (or sometimes literally) locks out access to the table for a very long time. The advantage of Cursors and WHILE loop approaches here is actually because they are so slow at it, they leave plenty of resource and access gaps for other process to get past them. And it is easy to use "chucks" and other techniques to make them even slower. And in this agree, as I said in my article, a Cursor is fine IF you actually want to make something slow.

    I disagree. IMO, the reason the set-based solution is slow is that it is trying to do too much at the same time. In a chunked solution, there are checkpoints along the way that release resources. Just trying to put so many records into a single transaction is likely a huge source of the problem as the log file grows to ginormous proportions.

    That said, it should be noted that there are also other ways to accomplish the same thing that do not use cursors.

    Agreed and almost all of those solutions involve an outside application (including SSIS) to chunk the data.

  • ... And I was wondering... 🙂

    What are your thoughts about Adam Machanic's article:

    http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx ?

    I tried to compare Adam's code to other options and came up with this one that was the closest in performance, but still a bit heavier (using Profiler):

    declare @ToID int,@ID int

    DECLARE @TransactionID INT

    DECLARE @ActualCost MONEY, @RunningTotal MONEY

    SELECT ID=IDENTITY(int,1,1),convert(INT,TransactionID) as TransactionID, ActualCost

    into #tmp

    FROM Production.TransactionHistory

    ORDER BY TransactionID

    set @ToID = @@ROWCOUNT

    create unique index idx_tmp on #tmp(ID)

    DECLARE @Results TABLE

    (

    TransactionID INT NOT NULL PRIMARY KEY,

    ActualCost MONEY,

    RunningTotal MONEY

    )

    select @ID = 1, @RunningTotal = 0

    WHILE @ID <= @ToId

    begin

    SELECT@TransactionID=TransactionID,

    @ActualCost = ActualCost,

    @RunningTotal=@RunningTotal+ActualCost

    from #tmp

    WHERE ID=@ID

    INSERT @Results

    VALUES (@TransactionID, @ActualCost, @RunningTotal)

    SET @ID = @ID +1

    end

    SELECT *

    FROM @Results

    ORDER BY TransactionID

    drop table #tmp

    It probably still can be tuned but probably not by much.

    Thanks,

    Michelle.

  • RBarryYoung (4/15/2009)


    Bruce W Cassidy (4/14/2009)


    Thomas (4/14/2009)


    Set-based operations do not natively provide for chunking and thus we have cursors. Of course there are other ways around this problem, but at the end of day the only reason we are forced to consider alternate solutions is because the Hoyle set-based solution is insufficient. Even using SSIS (which is frankly a heinous beast) is simply having another procedural language loop through chunks of set-based statements.

    [font="Verdana"]Agreed (both on SSIS and on the necessity for "chunking", although I call this batching (same concept). I wouldn't use a cursor for that though -- I would use a combination of TOP and a loop.[/font]

    And I would use a "Batch" Job for that: a SQL Agent recurring Job that also uses TOP. It has a lot of advantages around built-in monitoring and control. That is, the Job History log, job step output files and the abiltiy to change both its chunk size and recurrence non-disruptively, plus I do not have to keep a live client session open for it. It will survive crashes, power-losses and helpful janitors shutting down your workstation to save energy over the weekend. It really sucks to start a 48-hour chunking/batching process on Friday evening and find out on Monday morning that some minor glitch aborted it 2 hours in.

    Yep. I agree. However, recognize that we are again having to find alternative solutions because the base set based solution is insufficient.

  • Manie Verster (4/14/2009)


    Steve had an article a couple of days ago about breaking down code in a stored proc/query to make it more readable and editable. Sometimes cursors help you to do just that.

    Heh... I suppose it's a matter of what you "cut your teeth" on. I find cursors to be much more complicated than set based code and cursor code usually has more lines of code, to boot.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Michelle Gutzait (4/15/2009)


    ... And I was wondering... 🙂

    What are your thoughts about Adam Machanic's article:

    http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx ?

    I tried to compare Adam's code to other options and came up with this one that was the closest in performance, but still a bit heavier (using Profiler):

    declare @ToID int,@ID int

    DECLARE @TransactionID INT

    DECLARE @ActualCost MONEY, @RunningTotal MONEY

    SELECT ID=IDENTITY(int,1,1),convert(INT,TransactionID) as TransactionID, ActualCost

    into #tmp

    FROM Production.TransactionHistory

    ORDER BY TransactionID

    set @ToID = @@ROWCOUNT

    create unique index idx_tmp on #tmp(ID)

    DECLARE @Results TABLE

    (

    TransactionID INT NOT NULL PRIMARY KEY,

    ActualCost MONEY,

    RunningTotal MONEY

    )

    select @ID = 1, @RunningTotal = 0

    WHILE @ID <= @ToId

    begin

    SELECT@TransactionID=TransactionID,

    @ActualCost = ActualCost,

    @RunningTotal=@RunningTotal+ActualCost

    from #tmp

    WHERE ID=@ID

    INSERT @Results

    VALUES (@TransactionID, @ActualCost, @RunningTotal)

    SET @ID = @ID +1

    end

    SELECT *

    FROM @Results

    ORDER BY TransactionID

    drop table #tmp

    It probably still can be tuned but probably not by much.

    Thanks,

    Michelle.

    There is an example of what Jeff Moden calls "the quirky update" in my article here[/url] if you'd like to see something fast. I'd direct you to Jeff's article, but he is currently rewriting it and it isn't available. I keep proding him about it as it explains well how and why it works.

  • SanjayAttray (4/14/2009)


    Interesting article. Would be waiting for second installment. Cursors are bad ( period ) but in some situations we had to use them. It becomes necessary.

    The example typically says why not use cursors. it took 4:49 minutes to execute while select count(1) From master..syscolumns c1 Cross Join master..syscolumns c2 takes 1 second.

    Execution plan interestingly shows

    Query 1: Query cost (relative to the batch): 0.00%

    Actually, I believe Grant Fritchey covered that in his fine book about execution plans. One of the big problems with Cursors, While Loops, Recursion, and some UDF's is that they show the estimated execution plan for just the first iteration because it cannot anticipate more than one row without running. Makes it real hard to tune things.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • James Tran (4/14/2009)


    Good article RBarry Young. I got rid of my cursor by

    - Store the result data in table var with rownumber

    - create a row index var to loop thru each record

    - while there are still record available, Select @Var1 = Col1 from @Table where @rowIdx < colRowNum

    Oh, you said "While"... one of the recommendations from many, many supposed "performance consultants" is to replace a Cursor with a Temp table and a While Loop. Lets ask the question, "What's the difference between a Cursor and a Temp Table/While Loop combination?" If the Cursor is a "firehose" cursor (read only, forward only), the answer is [font="Arial Black"]NOTHING[/font] except how you proceed to the next row. They're both RBAR, they both take approximately the same time to execute, and they'll both take approximately the same number of resources... a lot.

    The combination of Temp Tables and While Loops are simply not a reasonable substitute for Cursors. If you cannot figure out a proper (there are many improper ways) set-based solution, then changing a Cursor to a Temp Table/While Loop combo is simply and exercise in futility. If you can, just change the cursor to a firehose cursor and call it a day.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Andy DBA (4/15/2009)


    So, getting back to my question, what I'm driving at is that using aggregated dynamic sql to call an sp multiple times (instead of a loop) requires more than meets the eye if the developer seeks to protect that sp by not granting execute rights on it. Unlike RBAR code, dynamic sql breaks the ownership chain because it executes in the context of the user login instead of the sp's author.

    To preserve the ownership chain (on SQL Server 2005 or better) the developer can sign the sp with a certificate or use the problematic EXECUTE AS and voila, the dreaded cursor has been replaced per your article's claim, but this comes with the expense of increased complexity and maintenance workload.

    Ah, OK, I understand. And you're right, using dynamic SQL isn't always the pristine affair that we make it out to be in examples. However, I hav ebeen quite happy with EXECUTE AS to address the above problem, it does take a little bit of forethought and planning, but once done I haven't noticed any appreciable increase in maintenance.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Perhaps people are missing the point about the While loop for populating a set of numbers vs. the cross join. First, let's all agree that this would normally be a one time occurrence. However, imagine for the moment such code was needed for a real problem. The While loop as I wrote it originally is pretty close to self-documenting. Almost any developer that has written any amount of code would recognize the loop and what it is doing even if they only had a passing familiarly with the Insert statement. (In fact, I showed both queries to some of the developers in my office and everyone understood the While loop by looking at it and only one person (albeit the most db savvy), after a few minutes of thought understood what the cross join was doing). The cross join requires a bit of analysis. Why are they using sys.columns? Why are they cross joining to them? I would want a developer using this trick to add a brief comment to explain what they were trying accomplish with this chunk of code. Most developers would call it a "clever" solution. "Clever" in this case meaning that it is not obvious but solves the problem in an ingenious way.

    Furthermore, there are times when you simply must use a cursor. For example, if you walk into an existing infrastructure where they have encapsulated some complex logic in a stored procedure and you need to execute that stored procedure on a queried set of data you must use a cursor. We can quibble all day about whether logic should be encapsulated in the stored proc in the first place or whether you can take the logic from the stored proc and put into a single script or whether it is possible to put the logic in a UDF but all that completely misses the point. If you don't want to re-write said logic and you are required to execute that logic on each row of a set of data, then you are forced to use cursors.

    Should you avoid cursors as much as possible: yes. Should you assume that you can always avoid them: no. Should you forever exclude them from your arsenal of solutions: again no.

Viewing 15 posts - 181 through 195 (of 380 total)

You must be logged in to reply to this topic. Login to reply