Cursor vs. set based procedure

  • Hi, everybody. I have a problem optimizing my stored procedure, but after a long research on cursors vs. set based approach, I'm still not sure how to solve it.

    Here's the task: I have a table with connection logs. There are 3 types of records: start (type=2), interim (type=3) and stop (type=4), and I have to create a single record by joining start leg with each interim and stop leg.

    We receive logs in text file, which we import into table and then run a procedure over it.

    I've created a query that gives me a result set with start and stop parts of the record

    select * from (select * from log where msgtype=2) A inner join (select * from log where msgtype>2) B on A.SessionID=B.SessionID where A.FileID=XY or B.FileID=XY

    The procedure looks like this

    declare cursor for QUERY_ABOVE

    while

    do the procesing and formatting the output record

    insert output record into result table

    update: mark original records as joined

    fetch next

    The problem is that the files we import are getting bigger and bigger, and currently, the procedure needs 15 minutes to process 130.000 records, so I've started looking for solutions. After short debugging, I've realized that browsing through cursor simply takes a lot of time, and that I can't improve much on current logic, and reading the articles on cursors vs. set based approach, it became clear that I should rewrite the whole thing. BUT, I just don't see how could I put all the formatting logic from procedure into a query (which is what I should do, if I'm not wrong)?

    The output record is consisted of 20+ fields with fixed length, and the content of each field is calculated from the various columns from the original table on different ways with different criterias and often value of lets say field_15 depends of the value in field_1 etc etc.

    I could make it a udf, and then do something like

    select my_udf(sessionid, start, stop,.....) from QUERY_ABOVE but something tells me that it is a same thing as cursor, it will still process row by row, not set.

    So, how do I avoid cursor?

    I hope I've been clear enough. Any clues and help are welcome.

  • My default answer would be it can be translated into a set based soultion , but it depends on what is happening when you say 'do the procesing and formatting the output record'. Is that a single simple statement ?

    Are you able to post the full code + DDL and sample data ?



    Clear Sky SQL
    My Blog[/url]

  • Unfortunatelly, I'm not allowed to post full code:(

    But I have a bunch of similar code for determining the value of each field in output record

    /* FIELD 2 */

    if @Field_1='F5'

    select @field_2='63'

    else

    if len(@TerminatingIOI)>1

    if @ConnType=0

    select @field_2='54'

    else

    select @field_2='61'

    else

    if @ConnType=0

    select @field_2='56'

    else

    select @field_2='60'

    /* FIELD 3*/

    if @field_1='F3'

    begin

    if @B_RECNUM=1 and @B_MSGTYPE = 3

    select @field_3='2'

    else if @B_RECNUM>1 and @B_MSGTYPE = 3

    select @field_3='3'

    else if @B_RECNUM>1 and @B_MSGTYPE = 4

    select @field_3='4'

    end

    of course, these are very simple, there are few more complicated fields which depends on result of some stored procedure like this

    execute checkportnum @POperNum,@field_14,@field_15, @Poper OUTPUT

    if @Poper>'' select @field_11 = @Poper + @POperNum

    etc etc

    In the meantime, I've got an idea: create temporary table in which I should store the results from query and which will have additional columns needed for output. And then, instead of looping through the recordset and doing this formating row by row, I could use a series of updates based on the logic for each field as shown above?

  • If i was you i would start by simplifying the logic in the loop and moving it piecemeal into the cursor statement.

    ie....

    (Untested)

    Create function CalcField_2(@Field1 char(2),@TerminatingIOI char(<Something>),@ConnType int)

    returns table

    as

    return

    (

    Select Case when @Field_1 = 'F5'

    else case when len(@TerminatingIOI)>1 then

    case when @ConnType =0

    then '54'

    else '61' end

    else

    case when @ConnType =0

    then '56'

    else '60' end

    end

    end as Field_2

    from (Select 1 as Col) as Row

    )

    So then you will be able to remove that code and use it like this...

    Select * from

    <YourSourceTable> Cross apply dbo.CalcField_2(Field_1,TerminatingIOI,ConnType )

    After you've done that to all the smaller bits of code ,you should be in a better situation to remove the looping altogether.

    BTW Notice how ive used an inline table function , not a scalar one.



    Clear Sky SQL
    My Blog[/url]

  • if you are going to use functions, use scalars and not table based for the logic. That should allow you to maintain the code and not take too much of a performance hit. As for changing your cursor to a set based solution, as long as you are not calling stored procedures (logging messages or some such), you should be able to get to a single update statement.

    My suggestion is you investigate using a CTE to simplify your JOINs and then create scalar functions to process the required transformations, which again will simplify writing the SQL. Once you have it working you can test moving different scalars into your core procedure until you find the right balance.

    --------------------------------------------------------------------------
    When you realize you've dug yourself into a hole....Step 1...stop digging.

  • Charles.Otten (12/16/2010)


    if you are going to use functions, use scalars and not table based for the logic. That should allow you to maintain the code and not take too much of a performance hit.

    Im afraid i cant agree with that statement at all.

    Scalar functions are nothing but a 'performance sink'. Very rarely ( twice or so this year) do i need one.

    Any empty scalar func use more resources in the order of magnitudes

    for example

    create function Scalar(@id integer)

    returns int

    as

    begin

    return 1

    end

    go

    go

    create function Inline(@id integer)

    returns table

    as

    return

    ( Select 1 as one

    from (Select null as n) as x

    )

    go

    select top(100000) 1,dbo.scalar(1)

    from sys.columns a cross join sys.columns b

    go

    select top(100000) 1,inl.one

    from sys.columns a cross join sys.columns b

    cross apply dbo.Inline(1) inl

    The other danger with scalar's is that some-else may decide to 'enhance' the function with more more functionality and add in a few SELECT's . 🙂



    Clear Sky SQL
    My Blog[/url]

  • i agree with the scalar functionality that you are describing being bad. Virtually any scalar using a SELECT is just going to be awful. The scalars i'm talking about would be passing in the values that you want the function to "calculate". Also, note I suggested to use scalars to simplify the SQL development and then to remove the scalars by moving the functionality back into the source SQL.

    A scalar function (that does not touch any table objects) is usually a pretty decent performance object and it does simplify coding. I do however, strongly suggest they get moved back in to the source SQL.

    --------------------------------------------------------------------------
    When you realize you've dug yourself into a hole....Step 1...stop digging.

  • Dave Ballantyne (12/16/2010)

    After you've done that to all the smaller bits of code ,you should be in a better situation to remove the looping altogether.

    BTW Notice how ive used an inline table function , not a scalar one.

    So, in the end I'd have something like this?

    Select * from

    <YourSourceTable> Cross apply dbo.CalcField_1(param1,param2,param3,...)

    Cross apply dbo.CalcField_2(param1,param2,param3,...)

    ...

    Cross apply dbo.CalcField_20(param1,param2,param3,...)

    It looks pretty messy and uninviting, I must admit:(

    What about my idea with temp table and series of updates? Is it worth trying?

  • KennethParker (12/16/2010)


    So, in the end I'd have something like this?

    Select * from

    <YourSourceTable> Cross apply dbo.CalcField_1(param1,param2,param3,...)

    Cross apply dbo.CalcField_2(param1,param2,param3,...)

    ...

    Cross apply dbo.CalcField_20(param1,param2,param3,...)

    It looks pretty messy and uninviting, I must admit:(

    What about my idea with temp table and series of updates? Is it worth trying?

    Messy and uninviting , possibly 🙂 , but now this is giving you more options , you could create a view , calculated columns etc....

    In any case , If I had to choose between 'messy' code and slow code , i know what i would choose 😉

    The temp tables , has legs but may not be the most performant due to the constant updating. It could create a lot of logging data. I would do that as a secondary concern .

    For now concentrate on getting as much logic as possible into the primary select.



    Clear Sky SQL
    My Blog[/url]

  • Charles.Otten (12/16/2010)


    i agree with the scalar functionality that you are describing being bad. Virtually any scalar using a SELECT is just going to be awful. The scalars i'm talking about would be passing in the values that you want the function to "calculate". Also, note I suggested to use scalars to simplify the SQL development and then to remove the scalars by moving the functionality back into the source SQL.

    A scalar function (that does not touch any table objects) is usually a pretty decent performance object and it does simplify coding. I do however, strongly suggest they get moved back in to the source SQL.

    Unfortunatelly, in some cases I must use stored procedures or some Selects for determining the field value, so it means that scalar functions wouldn't throw out the cursor completly.

  • Dave Ballantyne (12/16/2010)


    KennethParker (12/16/2010)


    So, in the end I'd have something like this?

    Select * from

    <YourSourceTable> Cross apply dbo.CalcField_1(param1,param2,param3,...)

    Cross apply dbo.CalcField_2(param1,param2,param3,...)

    ...

    Cross apply dbo.CalcField_20(param1,param2,param3,...)

    It looks pretty messy and uninviting, I must admit:(

    What about my idea with temp table and series of updates? Is it worth trying?

    Messy and uninviting , possibly 🙂 , but now this is giving you more options , you could create a view , calculated columns etc....

    In any case , If I had to choose between 'messy' code and slow code , i know what i would choose 😉

    The temp tables , has legs but may not be the most performant due to the constant updating. It could create a lot of logging data. I would do that as a secondary concern .

    For now concentrate on getting as much logic as possible into the primary select.

    I would agree here. Messy code is better than slow code.

    I would look heavily into the temp table route. Many times a complex process will run faster by breaking the parts into a series of temp table insert/updates/delete/selects

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Perhaps I am restating the obvious, but at a minimum, you can use set-based operations to set values for all columns that don't require a stored procedure to set them. Given the set of operations you've described, I don't even see why you'd need temp tables, just a sequence of operations which also update a status flag in your source and/or target log tables. (You've stated you already have the former.) And then use a cursor for a much smaller set of variables and operations truly requiring sprocs.

    The process would look something like this. First

    INSERT INTO (a, b, c, etc)

    SELECT * from -- your SessionID-matching join here

    Hopefully you could also replace/refactor-away all of those stored procedures you are calling with more set based logic; this'd probably require that you add more joins to your original SessionID-matching join, but depending on the logic in your sprocs, it could be doable. Particularly if those sprocs aren't themselves infected with more cursor logic (or UPDATEs affecting multiple tables) and are SELECT-oriented.

    Then, to indicate that you have processed the old records, even partially, you can do an UPDATE to your old log tables based on a very similar SELECT join clause. (You know that you can do an update to a single table based on joined tables, right?)

    Then, if you hadn't eliminated all the sprocs, you could have your minimal cursor logic that iterates through a smaller set of fields based on either your earlier SessionID-matching join and/or your new target table. And some operation in the cursor would do an UPDATE on the needed rows.

    And then (assuming you hadn't eliminated/refactored all the sprocs), you could re-update (again based on your SessionID-matching join condition) your update-status flag in the original log tables to indicate the update is fully (not just partially) complete.

    (I'd also mention that if it eases implementation logic, and/or you want to refactor the sprocs away but such sprocs might in turn involve multiple UPDATEs, it might help to have a similar flag in your new combined-status table that tracks the update status, effectively tracking the SessionIDs which need to be operated on by your intermediate set operations, rather than repeating your big Session-ID-matching join each time. You insert those records early on, so now you can just operate-on/update the set of them that you just-inserted, once you have the flag you need.)

  • Thank you for your advices, you've all been very helpfull. Now I have clearer picture of what to do and which way to go. I'll let you know about the outcome of the whole project, hopefully it wouldn't be such a nightmare after all:)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply