There Must Be 15 Ways To Lose Your Cursors… Part 2: Just Put It in a S

  • min.li (4/28/2009)


    Hello,

    I recently had to use curosr to do a function in my work. Although I knew I should try to avoid cursor, but could not find a way. I wonder if you can figure out doing it without cursor.

    I have a table like this:

    Name RowNumber

    RL 1

    RL 2

    ..

    RL 10

    SL 11

    SL

    .. 15

    RL 16

    ..

    RL 30

    SL 31

    .. 41

    The RowNumber is granteed to be in sequence and there are actually more than two values for the 'Name' column, only two listed here for simplicity.

    The result of the function I need is a return table varaible like

    Name startRowNumber endRowNumber

    RL 1 10

    SL 11 15

    RL 16 30

    SL 31 41

    I am really curious if there is a way to achieve it without using cursor.

    Thanks.

    Your case is hard to folllow, can you at least make a test script with some data in SQL form so we use that and don't each have to make our own? This will greatly increase the response to your particular issue!

  • Hi,

    Firstly, test table can be prepared as:

    IF OBJECT_ID('dbo.TestData') IS NOT NULL

    DROP TABLE dbo.TestData

    CREATE TABLE dbo.TestData

    (Name varchar(4) not null,

    RowNumber int not null

    )

    DECLARE @Counter int

    SET @Counter = 1

    WHILE @Counter <= 100

    BEGIN

    IF ((@Counter/10)%2=0)

    INSERT INTO dbo.TestData(Name, RowNumber) VALUES ('RL', @Counter)

    ELSE

    INSERT INTO dbo.TestData(Name, RowNumber) VALUES ('SL', @Counter)

    SET @Counter = @Counter + 1

    END

    Then, the cursor function is:

    CREATE FUNCTION [dbo].[TestNumberRange]()

    RETURNS @DataDistribute TABLE (

    Name varchar(4)not null,

    StartNum intnot null,

    EndNum intnot null) AS

    BEGIN

    declare @name varchar(4),

    @preName varchar(4),

    @rowNumber int,

    @startNumber int,

    @endNumber int

    declare data_cursor cursor for

    select Name, RowNumber from TestData order by RowNumber

    set @preName= 'None'

    set @endNumber = 0

    open data_cursor

    fetch next from data_cursor into @name, @rowNumber

    while (@@fetch_status = 0)

    begin

    if (@name != @preName)

    begin

    if @endNumber != 0

    insert into @DataDistribute

    values (@preName, @startNumber, @endNumber)

    set @startNumber = @rowNumber

    set @preName = @name

    end

    set @endNumber =@rowNumber

    fetch next from data_cursorinto @name, @rowNumber

    end

    insert into @DataDistribute values (@name, @startNumber, @endNumber)

    close data_cursor

    deallocate data_cursor

    RETURN

    END

    Lastly, function can be tested as:

    select * from dbo.TestNumberRange()

    Can it be done without cursor?

  • min.li (4/28/2009)


    Hello,

    I recently had to use curosr to do a function in my work. Although I knew I should try to avoid cursor, but could not find a way. I wonder if you can figure out doing it without cursor.

    I have a table like this:

    Name RowNumber

    RL 1

    RL 2

    ..

    RL 10

    SL 11

    SL

    .. 15

    RL 16

    ..

    RL 30

    SL 31

    .. 41

    The RowNumber is granteed to be in sequence and there are actually more than two values for the 'Name' column, only two listed here for simplicity.

    The result of the function I need is a return table varaible like

    Name startRowNumber endRowNumber

    RL 1 10

    SL 11 15

    RL 16 30

    SL 31 41

    I am really curious if there is a way to achieve it without using cursor.

    Thanks.

    try this - i created a simple table to test it on

    create table temp1 (col1 varchar(2), rownum int)

    insert into temp1 values ('RL', 1)

    insert into temp1 values ('RL', 2)

    insert into temp1 values ('RL', 3)

    insert into temp1 values ('RL', 4)

    insert into temp1 values ('SL', 5)

    insert into temp1 values ('SL', 6)

    insert into temp1 values ('SL', 7)

    insert into temp1 values ('SL', 8)

    insert into temp1 values ('RL', 9)

    insert into temp1 values ('RL', 10)

    insert into temp1 values ('RL', 11)

    insert into temp1 values ('RL', 12)

    insert into temp1 values ('SL', 13)

    insert into temp1 values ('SL', 14)

    insert into temp1 values ('SL', 15)

    insert into temp1 values ('SL', 16)

    select D1.col1, d2.start, d1.[end]

    from (

    select t1.col1, t1.rownum as [end], row_number () over (order by t1.col1, t1.rownum) as rn

    from temp1 t1

    left outer join temp1 t2

    on t1.rownum + 1 = t2.rownum

    or t2.rownum is null

    where t1.col1 t2.col1 or t2.col1 is null) D1

    inner join

    (select t1.col1, t1.rownum as [start], row_number () over (order by t1.col1, t1.rownum) as rn

    from temp1 t1

    left outer join temp1 t2

    on t1.rownum - 1 = t2.rownum

    where t1.col1 t2.col1 or t2.col1 is null) D2

    on D1.col1 = D2.col1

    and d1.rn = d2.rn

    order by d2.start

    -- EDITED - just tested it with a 2000 row sample and noticed an error. I fixed the SQL in this posted

  • Andrew L. Smith (4/27/2009)


    Barry:

    I agree with this for the most part, but have an exception that I would like to see if you have an answer for. For example, I need to receive a number of records from a Message Queue and perform (or not perform) a stored procedure, passing information from the record read to the stored procedure. I am not updating anything and I do not need the combined results of the cursor. Can you suggest a way to accomplish this without a cursor? (I am actually passing the data to a stored procedure that e-mails me an alert that certain events have occured on the database server.)

    Regards,

    Andrew Smith, SQL Server DBA

    Heh. Message Broker is actually a very special and most difficult case because of the incompleteness of the final implementation it is both very set-orianted but also very antagonistic towrds sat-based usage. As such I will propably not be getting to it until that later installements (this also affects one of the most asked questions in the seires, but not so obviously).

    However, the answer to this question is easy, so I will address it now: 1) service your queue only with an activation stored procedure that only receives 1 row at a time. 2) for greater performance, set your activation count higher.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thank you min.li, I ran the test code and that works and is a good starting point. I will take a deeper look at the actual code later today, problably in the evening as I got to work atm. I feel confident your cursor can be fully eliminated.

    I will be working on SQL Server 2005, If you use a older verson let usknow that as well.

  • peter (4/28/2009)


    I will be working on SQL Server 2005, If you use a older verson let usknow that as well.

    Good point... the solution I posted is 2005/2008 specific. Probably could be altered to suit SQL Server 2000 though

  • Hi,

    Thanks for all reply. I am working on SQL server 2005. However, Samuel's solution does not work on my SQL server, which returns following error message:

    Msg 4104, Level 16, State 1, Line 1

    The multi-part identifier "d1.rn" could not be bound.

    Msg 4104, Level 16, State 1, Line 1

    The multi-part identifier "d2.rn" could not be bound.

    Msg 4104, Level 16, State 1, Line 1

    The multi-part identifier "d2.start" could not be bound.

    Msg 4104, Level 16, State 1, Line 1

    The multi-part identifier "d1.end" could not be bound.

    Msg 4104, Level 16, State 1, Line 1

    The multi-part identifier "d2.start" could not be bound.

  • min.li (4/28/2009)


    Hi,

    Thanks for all reply. I am working on SQL server 2005. However, Samuel's solution does not work on my SQL server, which returns following error message:

    Msg 4104, Level 16, State 1, Line 1

    The multi-part identifier "d1.rn" could not be bound.

    Msg 4104, Level 16, State 1, Line 1

    The multi-part identifier "d2.rn" could not be bound.

    Msg 4104, Level 16, State 1, Line 1

    The multi-part identifier "d2.start" could not be bound.

    Msg 4104, Level 16, State 1, Line 1

    The multi-part identifier "d1.end" could not be bound.

    Msg 4104, Level 16, State 1, Line 1

    The multi-part identifier "d2.start" could not be bound.

    weird... I just copied the post directly from here into a new SQL Server Query window and it ran without any errors

    Can you just make sure you copied it properly and didn't change anything

    UPDATE

    Just realised what the problem is --- you're using a case sensitive server and I (in my lazyness) wrote case INsensitive code

    DROP TABLE TEMP1

    CREATE TABLE TEMP1 (COL1 VARCHAR(2), ROWNUM INT)

    INSERT INTO TEMP1 VALUES ('RL', 1)

    INSERT INTO TEMP1 VALUES ('RL', 2)

    INSERT INTO TEMP1 VALUES ('RL', 3)

    INSERT INTO TEMP1 VALUES ('RL', 4)

    INSERT INTO TEMP1 VALUES ('SL', 5)

    INSERT INTO TEMP1 VALUES ('SL', 6)

    INSERT INTO TEMP1 VALUES ('SL', 7)

    INSERT INTO TEMP1 VALUES ('SL', 8)

    INSERT INTO TEMP1 VALUES ('RL', 9)

    INSERT INTO TEMP1 VALUES ('RL', 10)

    INSERT INTO TEMP1 VALUES ('RL', 11)

    INSERT INTO TEMP1 VALUES ('RL', 12)

    INSERT INTO TEMP1 VALUES ('SL', 13)

    INSERT INTO TEMP1 VALUES ('SL', 14)

    INSERT INTO TEMP1 VALUES ('SL', 15)

    INSERT INTO TEMP1 VALUES ('SL', 16)

    SELECT D1.COL1, D2.START, D1.[END]

    FROM (

    SELECT T1.COL1, T1.ROWNUM AS [END], ROW_NUMBER () OVER (ORDER BY T1.COL1, T1.ROWNUM) AS RN

    FROM TEMP1 T1

    LEFT OUTER JOIN TEMP1 T2

    ON T1.ROWNUM + 1 = T2.ROWNUM

    OR T2.ROWNUM IS NULL

    WHERE T1.COL1 T2.COL1 OR T2.COL1 IS NULL) D1

    INNER JOIN

    (SELECT T1.COL1, T1.ROWNUM AS [START], ROW_NUMBER () OVER (ORDER BY T1.COL1, T1.ROWNUM) AS RN

    FROM TEMP1 T1

    LEFT OUTER JOIN TEMP1 T2

    ON T1.ROWNUM - 1 = T2.ROWNUM

    WHERE T1.COL1 T2.COL1 OR T2.COL1 IS NULL) D2

    ON D1.COL1 = D2.COL1

    AND D1.RN = D2.RN

    ORDER BY D2.START

  • Thanks Samuel, It works now. Silly me, I should figure out myself.

  • Back min.li ( I couldn't resist the challenge)

    First I modified your test case in one important way!

    alter table dbo.TestData add constraint pk_TestData primary key clustered ( RowNumber );

    Adding a clustered index on the rownumber seemed fair as you said the order is enforced and implied it is not broken up by anything. The effect of the index kicks in as the source table increases in the number of records it contains.

    Here is what I came up with....I would not necessarily consider this full set based code, but I did eliminate the cursor and it scales far better then the original solution that used a cursor.

    create function dbo.TestNumberRange_nocursor1() returns table

    as

    return

    (

    with

    lowerEdgesQ( SegmentID, Name, RowNumber ) as

    (

    select

    row_number() over ( order by d.RowNumber asc )

    , d.Name

    , d.RowNumber

    from

    dbo.TestData as d

    where

    -- detects lower edge of each name

    not exists( select 1 from dbo.TestData as i where i.Name = d.Name and i.RowNumber = d.RowNumber - 1 )

    )

    , upperEdgesQ( SegmentID, Name, RowNumber ) as

    (

    select

    row_number() over ( order by d.RowNumber asc )

    , d.Name

    , d.RowNumber

    from

    dbo.TestData as d

    where

    -- detects upper edge of each name

    not exists( select 1 from dbo.TestData as i where i.Name = d.Name and i.RowNumber = d.RowNumber + 1 )

    )

    select

    lowerEdgesQ.Name

    , lowerEdgesQ.RowNumber as StartNum

    , upperEdgesQ.RowNumber as EndNum

    from

    lowerEdgesQ

    inner join upperEdgesQ on upperEdgesQ.SegmentID = lowerEdgesQ.SegmentID

    )

    ;

    For testing I used the timing code provided in the article and I included the solution of Samuel Vella as well.

    First I ran against the 100 records you put initially into the source table, results are:

    /* cursor original by min.li */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    0 425 3

    /* cursor free by peter */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    0 404 1

    /* cursor free by Samuel Vella */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    0 404 2

    Then I ran another test with 10000 records to showcase scaling, results are:

    /* cursor original by min.li */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    250 41083 249

    /* cursor free by peter */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    31 104 41

    /* cursor free by Samuel Vella */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    31 292 128

    As you can see, both the cursor free versions scale much better. The one I supplied while not pure set based yet, is not hard to understand and the function is inline. The later meaning that the optimizer can make it part of the main query and cut out any overhead it detects before execution!

  • Thanks peter, this works perfectly.

  • I did a quick optization of your own code which is interesting to know in case the time to alter existing code is limited!

    I changed the declaration of the cursor as:

    declare data_cursor cursor local fast_forward read_only for

    select Name, RowNumber from TestData order by RowNumber

    This restricts the cursor a little but suits your use just fine and is less resource hungry. This directly shows up in the following timings:

    For 100 records in the source table:

    /* cursor original by min.li */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    0 425 3

    0 122 1 /* cursor local fast_forward read_only for */

    /* cursor free by peter */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    0 404 1

    /* cursor free by Samuel Vella */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    0 404 2

    For 10000 records in the source table:

    /* cursor original by min.li */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    250 41083 249

    125 11080 136 /* cursor local fast_forward read_only for */

    /* cursor free by peter */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    31 104 41

    /* cursor free by Samuel Vella */

    CpuMs LogRds Elapsed

    ----------- -------------------- -----------

    31 292 128

    You see that your original with a modified cursor declaration is best performing on small sets in this case. But don't get ahead of yourself and put cursors everywhere 😉 as there is a BIG catch. They don't scale and they cannot be used in inline table valued functions and thus cannot be part of the optimization process when part of a larger query. This means that the optimized cursor version still ends up loosing in cases where we do not directly display the function results.

  • peter (4/28/2009)


    Here is what I came up with....I would not neccecarily consider this full set based code, but I did eliminate the cursor and it scales far better then the original solution that used a cursor.

    Interesting results Peter 🙂

    What intrigued me was that both our solutions are effectively the same however yours was significantly more efficient than my own

    There was one key difference between our solutions - we took different approaches to the problem of when to detect the change from one row name to the next.

    I had done mine through a self join and subtracting (or adding) 1 to the row number to define the join and when there was no join then that was the row to pick.

    you had used a subquery instead of a self join. I always work without subqueries whenever possible on the general assumption that they are less efficient

    However, converting your function into a query and then renaming the table and column names to use the test table I built (no indexes on it) and then comparing execution times on my own unindexed table confirmed that in this case the sub query is quicker.

    Completely counter to how I would normally tackle the problem.

    I finally learnt something today 😀

  • shashank (4/28/2009)


    I've recently used CURSOR in one of my application (unwishingly though). We were working on a survey application and we collected the data of our web forms in our custom entity objects and finally xml serialized them on a button's click and passed the entire xml to a stored procedure which traversed and read the xml and inserted the data into the actual database. Now the structure of the xml was something like this

    <Questions>

    <Question type="1">

    <text>how are you</text>

    <cretionDate>2008-12-3</cretionDate>

    <QuestionDetails>

    some more nodes........

    </QuestionDetails>

    </Question>

    <Question type="2">

    <textvFeeling Good</text>

    <cretionDate>2008-12-3</cretionDate>

    <QuestionDetails>

    some more nodes........

    </QuestionDetails>

    </Question>

    </Questions>

    How could i prevent the use of cursor???

    Obviously, nobody can give a complete answer without more specific requirements, as well as sample data.

    But, if you use the XML data type you can query the XML using the nodes(), and value() functions.

  • Thanks Samuel and Peter.

    The real environment where this function to be used, the 'TestData' table is not really large. Or to be exact, the TestData table is large, but this function only work on a subset of the TestData table each time in sequence, the size of the subset is defined by a pass in parameter 'PageSize', which will not be over 1000 maximum. In this context, I think the cursor solution is still suitable, as scaling is not such important factor to consider. And, I have to say that the cursor solution is a lot easier to read and understand for me as the solutions that you two have offered are quite complex to me at the moment, I will not be able to work it out by myself. So I still think cursor is appropriate in such circumstances.

Viewing 15 posts - 76 through 90 (of 316 total)

You must be logged in to reply to this topic. Login to reply