There Must Be 15 Ways To Lose Your Cursors... part 1, Introduction

  • Girish Bhat (4/15/2009)


    Thomas (4/14/2009)


    This type of discussion reminds me of heated arguments I have had with Mr Celko about SQL standards......

    A simple example of where set based operations break down is in very large data sets. I want to take a table of 10 million rows and copy it to another table in a production system. The obvious solution would seem to be something like:

    Insert DestTable

    Select ...

    From TableA

    Yet, at the number of rows we are discussing, this could take hours to execute even on the best hardware ....

    Another common use of cursors is to script a DDL change. For example, I want to write a script that will alter all columns in all tables call "Foo" to be nvarchar(10) instead of nvarchar(5). That is simply not possible using a set based solution......

    Hi Thomas,

    For moving 10 million rows, I would use BCP not SQL.

    For the second example, you could use:

    sp_MSforeachtable

    @command1 = "Alter Table ? Alter Column Foo nvarchar(10)",

    @whereand = "and OBJECT_ID in (select object_ID from sys.columns where name = 'Foo')"

    BCP only works if you can output the original source to a file. It does not work if you need to query an active table. Regardless, as I mentioned, there are plenty of other options which we have to consider because the set-based syntax we are provided is insufficient.

  • Couple of very simple solutions.

    First one is SQL 2005/2008 only, won't work in 2000:

    create table dbo.Numbers (Number int primary key);

    go

    insert into dbo.Numbers (Number)

    select top 10000 row_number() over (order by t1.object_id)

    from master.sys.columns t1

    cross join master.sys.columns t2;

    That does not qualify as "simple" IMO and is a bit of a hack, but I get the idea.

    Another 2k5 version, if you don't want to select from system tables/views/etc.

    create table dbo.Numbers (Number int primary key);

    go

    ;with

    CTE (Number) as

    (select 1 union 2 union 3 union 4 union 5 union 6 union 7 union 8 union 9 union 10)

    CTE2 (Number) as

    (select c1.Number

    from CTE c1

    cross join CTE c2)

    CTE3 (Number) as

    (select c3.Number

    from CTE2 c3

    cross join CTE2 c4)

    insert into dbo.Numbers (Number)

    select top 10000 row_number() over (order by Number)

    from CTE3;

    Yes, albeit it takes more to write than to execute.

    For an SQL 2000 version, try this:

    create table dbo.Numbers (Number int identity primary key, PlaceHolder bit);

    go

    insert into dbo.Numbers (PlaceHolder)

    select top 10000 null

    from master.dbo.syscolumns

    cross join master.dbo.syscolumns;

    go

    alter table dbo.Numbers

    drop column PlaceHolder;

    This typical of what I have seen.

    In 2000, 2005 or 2008, this will also work, but it's just a loop that doesn't use While or a cursor.

    create table dbo.Numbers (Number int identity primary key)

    go

    set nocount on;

    go

    insert into dbo.Numbers

    default values;

    go 10000

    That is a new one! I never knew that you could give the Go command a counter and have it execute in a loop.

    All of these are faster than cursors/loops.

    Actually, not true. In my tests the above solutions at best matched the cursor or were beat by it and the cursor certainly smoked all of the above in having to devise ingenuity for what ought to be a simple solution.

  • I stand corrected. Cross joining on syscolumns does appear to be a bit faster. Perhaps this is one loop based solution that now has a reasonable set-based solution.

  • Thomas (4/15/2009)


    All of these are faster than cursors/loops.

    Actually, not true. In my tests the above solutions at best matched the cursor or were beat by it and the cursor certainly smoked all of the above in having to devise ingenuity for what ought to be a simple solution.

    Hi Thomas,

    but that seems to be the problem with the use of cursors throughout a database solution. Many developers do not know these "ingeneous" ways of solving a problem. I am not saying that a developer that uses a cursor is a bad developer, what I mean is that the other set-based methods are unknown so cannot be used.

    I know that if the series has the same quality that RBarryYoung produces here on a daily basis, it will be a wonderful primer for quite a few database developers that I know.

    regards

    GermanDBA

    Regards,

    WilliamD

  • but that seems to be the problem with the use of cursors throughout a database solution. Many developers do not know these "ingeneous" ways of solving a problem. I am not saying that a developer that uses a cursor is a bad developer, what I mean is that the other set-based methods are unknown so cannot be used.

    That is a function of intuitiveness of the language. Cursors and loops are more intuitive than set-based solutions in some cases. Take our example of populating a table with a series of numbers. Running through a loop and inserting an incrementing value is probably more intuitive to people than cross joining on sys.columns.

    Set-based solutions are generally more future proof and obviously their use is a factor of knowing solutions which is obviously the point of the article. However, I have, in my travels, run into situations where even though a set-based solution existed, it performed worse than a cursor in that particular version of the DBMS for that particular problem. I suppose it is akin to denormalizing. You have to know the reasons for normalizing and be versed in its use before you can consciously decide to deviate for a particular solution.

  • Thomas (4/15/2009)


    I stand corrected. Cross joining on syscolumns does appear to be a bit faster. Perhaps this is one loop based solution that now has a reasonable set-based solution.

    Take it further, try comparing the cursor (loop method) of building a Tally table to the other methods provided with a million values.

    I think you'll find the set based methods scale a lot better.

  • Thomas (4/15/2009)


    Couple of very simple solutions.

    First one is SQL 2005/2008 only, won't work in 2000:

    create table dbo.Numbers (Number int primary key);

    go

    insert into dbo.Numbers (Number)

    select top 10000 row_number() over (order by t1.object_id)

    from master.sys.columns t1

    cross join master.sys.columns t2;

    That does not qualify as "simple" IMO and is a bit of a hack, but I get the idea.

    . . . . . .

    I have to disagree.

    This solution includes the CREATE statement for the results table. Surely that's not what disqualifies it from being "simple".

    Then it limits the results with "TOP 10000". That's got to be considered simple.

    The domain is a cross join of master.sys.columns upon itself, providing a large supply of combinations to count. Perhaps that's not an obvious technique, but even if one is unfamiliar with "cross join", once you find out what that does, it's hardly complicated.

    That would leave the row_number() function with its parameters, "over (order by t1.object_id). This function, and its brothers RANK(), DENSERANK() and NTILE(), are ranking functions new to SQL with the 2005 version. Rather than saying they're not simple, or are "a bit of a hack", I'd fully agree that they're a features that, until you're familiar with them, are not intuitively grasped. As Barry points out in his article, the introduction of the ranking functions in SQL 2005 was a major step in helping us write fully set-based, declarative SQL, and allowing us to put aside procedural code such as cursors and while loops. This is exactly the sort of thing the ranking functions are there for.

    It took me some time to understand these functions and how they may be useful, but the effort was more than worth while. These are new functions, not extremely difficult to understand with a bit of effort, and very powerful additions to the SQL toolbox. I strongly encourage you to make that effort.

  • RBarryYoung (4/14/2009)


    ... (Note: Wikipedia is using revisionist terminology that would instead call this "imperative programming", but I like the old terms better).

    I haven't heard the term revisionist terminology before but I have a need for it, and I dig your use of it here. It applies in so many situations these days, my resume included. Although I was in the Data Integration department at the Big Bank for years, my resume now reflects that work as ETL. Same concept, different century.

    BTW I hadn't replied yet, but I really appreciated the readability of the article and look forward to your follow ups. I have been using SQL since 1998, and TSQL since 2005. I still fall back on cursors occasionally but have been very disappointed with their performance in SQL Server and will be reading eagerly to see if your examples apply to some of the work I am doing. Mostly I use them to read through system or driver tables to create iterative processes. e.g. repeated table operations using a cursor to spin through the sys.objects and dynamic code to allow table name substitution. Stuff like that.

    😎 Kate The Great :w00t:
    If you don't have time to do it right the first time, where will you find time to do it again?

  • I'm all for declarative and rarely or almost never use while loops or cursors, but how would I go about doing this without while loop...btw I'm writing this in a hurry, so please go easy on me 🙂

    create table ConditionTable

    (

    ConditionTableKey int identity(1,1) primary key not null,

    Condition1 bit,

    Condition2 bit,

    Condition3 datetime

    )

    create table DataTable

    (

    DataTableKey int identity(1,1) primary key not null,

    DataName varchar (20),

    Canceled bit,

    Active bit,

    SysDate datetime

    )

    insert into ConditionTable

    (

    Condition1,

    Condition2,

    Condition3

    )

    values(

    1,

    1,

    '20090425'

    )

    insert into ConditionTable

    (

    Condition1,

    Condition2,

    Condition3

    )

    values(

    null,

    1,

    '20090426'

    )

    insert into ConditionTable

    (

    Condition1,

    Condition2,

    Condition3

    )

    values(

    0,

    1,

    '20090427'

    )

    insert into ConditionTable

    (

    Condition1,

    Condition2,

    Condition3

    )

    values(

    1,

    1,

    '20090427'

    )

    insert into DataTable

    (

    DataName,

    Canceled,

    Active,

    SysDate

    )

    values(

    'Data1',

    1,

    1,

    '20090427'

    )

    insert into DataTable

    (

    DataName,

    Canceled,

    Active,

    SysDate

    )

    values(

    'Data1',

    0,

    1,

    '20090426'

    )

    insert into DataTable

    (

    DataName,

    Canceled,

    Active,

    SysDate

    )

    values(

    'Data2',

    0,

    1,

    '20090424'

    )

    insert into DataTable

    (

    DataName,

    Canceled,

    Active,

    SysDate

    )

    values(

    'Data1',

    0,

    1,

    '20090425'

    )

    insert into DataTable

    (

    DataName,

    Canceled,

    Active,

    SysDate

    )

    values(

    'Data3',

    1,

    1,

    '20090426'

    )

    insert into DataTable

    (

    DataName,

    Canceled,

    Active,

    SysDate

    )

    values(

    'Data1',

    1,

    0,

    '20090427'

    )

    insert into DataTable

    (

    DataName,

    Canceled,

    Active,

    SysDate

    )

    values(

    'Data3',

    1,

    0,

    '20090425'

    )

    drop table #Result

    create table #Result

    (

    DataTableKey int,

    DataName varchar (20),

    Canceled bit,

    Active bit,

    SysDate datetime

    )

    declare @MaxCondition int

    declare @Counter int

    declare @Condition1 bit

    declare @Condition2 bit

    declare @Condition3 datetime

    select @MaxCondition = max (ConditionTableKey)

    from ConditionTable

    select @Counter = 1

    while (@Counter <= @MaxCondition)

    begin

    print @Counter

    select@Condition1 = Condition1,

    @Condition2 = Condition2,

    @Condition3 = Condition3

    fromConditionTable

    whereConditionTableKey = @Counter

    insert into#Result

    select*

    fromDataTable

    where(@Condition1 is null or Canceled = @Condition1)

    and(@Condition2 is null or Active = @Condition2)

    and(@Condition3 is null or SysDate = @Condition3)

    andDataName not in (select DataName from #Result)

    select @Counter = @Counter + 1

    end

    select*

    from#Result

  • Thomas (4/15/2009)


    Couple of very simple solutions.

    First one is SQL 2005/2008 only, won't work in 2000:

    create table dbo.Numbers (Number int primary key);

    go

    insert into dbo.Numbers (Number)

    select top 10000 row_number() over (order by t1.object_id)

    from master.sys.columns t1

    cross join master.sys.columns t2;

    That does not qualify as "simple" IMO and is a bit of a hack, but I get the idea.

    In what way is a basic Insert...Select not simple and a bit of a hack? You lost me on that one.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Second day and I'm still enjoying the discussion. No matter which side you're on (I think the vast majority here are on the correct side), the good thing about hot topics is all the other great stuff that comes out for those of us who are still learning! Thanks Again RBarry!

  • Dejan,

    This is a classic example of how powerful the row_number() function can be. I coded this solution using a CTE out of personal preference, although I suppose that part may have be in a subquery. I also retained the "where.... not in result" clause, although the core query in the CTE doesn't depend on it. One more thing is that the cursor-based code didn't specify the precedence whereby one data row would be preferred over the rest with the same value of dataname, so I guessed at SysDate.

    drop table #Result

    create table #Result

    (

    DataTableKey int,

    DataName varchar (20),

    Canceled bit,

    Active bit,

    SysDate datetime

    )

    ;with cte (datarownum, datatablekey, dataname, canceled, active, sysdate) as

    (

    select datarownum = row_number() over (partition by d.dataname order by d.sysdate)

    ,d.DataTableKey, d.DataName, d.Canceled, d.Active, d.SysDate

    from DataTable d

    join conditiontable c on (d.Canceled = c.condition1 or c.condition1 is null)

    and (d.Active = c.condition2 or c.condition2 is null)

    and (d.SysDate = c.condition3 or c.condition3 is null)

    )

    Insert into #Result

    select datatablekey, dataname, canceled, active, sysdate

    from cte

    where datarownum = 1

    and DataName not in (select DataName from #Result)

    select *

    from #Result

  • RBarryYoung (4/14/2009)


    Andy DBA (4/14/2009)


    Also wondering if ownership chaining is still an issue with dynamic sql. Pretty sure I can write an sp that wraps sp_dont_touch_this(@someargument varchar(25)) in a cursor loop and grant execute on it to someuser without having to grant execute on sp_dont_touch_this to someuser. Is the same still not true for dynamic SQL in SQL 2005?

    Not sure what you are driving at here or what the concern is. Do you have a runnable example?

    First, thanks for the reply. Your article generated quite the hail storm and IMO you're doing a remarkable job getting back to people. Second, I agree wholeheartedly with the spirit of your article. A major portion of my job involves maintenance on a third party product riddled with unnecessary cursors nested high and deep. (RBAR ad-nauseum.) Sadly the product's developers are unconcerned with cpu, i/o, or concurrence. Hopefully articles such as yours may help to enlighten future generations with the power and speed of set based solutions.

    However, like it or not, there are times when many of us can't rewrite an sp to accept a CTE or perform a set based operation. Whether it's sp_sendmail, some other MS system sp, or a third party sp, we are stuck calling it a bunch of times with a set of values.

    So, getting back to my question, what I'm driving at is that using aggregated dynamic sql to call an sp multiple times (instead of a loop) requires more than meets the eye if the developer seeks to protect that sp by not granting execute rights on it. Unlike RBAR code, dynamic sql breaks the ownership chain because it executes in the context of the user login instead of the sp's author.

    To preserve the ownership chain (on SQL Server 2005 or better) the developer can sign the sp with a certificate or use the problematic EXECUTE AS and voila, the dreaded cursor has been replaced per your article's claim, but this comes with the expense of increased complexity and maintenance workload.

    Also, I'm betting execution plan caching is better in this case with RBAR than with dynamic sql, but I might be surprised with SQL Server 2005.

    I'm still curious. Has anyone benchmarked aggregated dynamic sql against tuned RBAR or has everyone just used up all of their spare time reading this long winded post? 😉

  • Thomas (4/15/2009)


    I stand corrected. Cross joining on syscolumns does appear to be a bit faster. Perhaps this is one loop based solution that now has a reasonable set-based solution.

    Thomas (4/15/2009)


    Actually, not true. In my tests the above solutions at best matched the cursor or were beat by it and the cursor certainly smoked all of the above in having to devise ingenuity for what ought to be a simple solution.

    Not sure how you're testing these. I just ran all of these as tests. The loop method took over 2 seconds to populate the table. 2656 milliseconds, to be precise.

    The first example I gave took 47 milliseconds. That's certainly not "does appear a bit faster". It's 1/56th the time.

    The SQL 2000 version took 100 milliseconds, including dropping the PlaceHolder column. Again, over 20x faster.

    The only one that was about the same as the loop was the GO 10000 method, which took a little over 3 seconds.

    And as for simplicity vs complexity, I guess I don't see how 4 lines of code is less complex than 7 lines, and I don't see how Insert...Select is more complex than While Begin...End. They're both very simple, basic concepts.

    Just to be clear, here's how I tested the loop:

    if object_id(N'Numbers','U') is not null

    drop table dbo.Numbers;

    go

    create table dbo.Numbers (Value int primary key);

    go

    declare @N datetime; set @N = getdate();

    Declare @i int

    Set @i = 0

    While @i < 10000

    Begin

    Insert Numbers(Value) Values(@i)

    Set @i = @i + 1

    End

    select datediff(millisecond, @N, getdate());

    Here's how I tested the first sample:

    if object_id(N'Numbers','U') is not null

    drop table dbo.Numbers;

    go

    create table dbo.Numbers (Number int primary key);

    go

    declare @N datetime; set @N = getdate();

    insert into dbo.Numbers (Number)

    select top 10000 row_number() over (order by t1.object_id)

    from master.sys.columns t1

    cross join master.sys.columns t2;

    select datediff(millisecond, @N, getdate());

    And so on.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Joe Celko (4/15/2009)


    >> You want to build a numbers table that simply contains a series of sequential integers.<<

    One of the Newsgroups has a thread with a dozen plus ways that do not use a cursor and run orders of magnitude faster. I happen to like the highly portable and parallelizable solution:

    WITH Digits (i)

    AS

    (SELECT i

    FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (0))

    AS D(i)

    )

    SELECT 1 + D1.i + 10*D2.i + 100*D3.i + 1000*D4.i

    FROM Digits AS D1, Digits AS D2, Digits AS D3, Digits AS D4;

    Joe, the problem with that one is it doesn't work in SQL 2000 or 2005, which are still much more common than 2008. Otherwise, it's an interesting solution.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 151 through 165 (of 380 total)

You must be logged in to reply to this topic. Login to reply