There Must Be 15 Ways To Lose Your Cursors… Part 2: Just Put It in a S

  • aaronxramirez (4/27/2009)


    Just stepped in for a quick read up on the article...good article, however the examples are a bit naive and are not applied toward anything I would have ever thought someone might use a cursor for - why would anyone use a cursor to loop over a table when the same result is created with a simple select statement. I understand the examples are made simple to make a point, but the only time I have ever needed a cursor is to achieve some sort of lookahead in the result set, such that data in in row 1 depended on knowing the value of some data in row 10 and the only way to get that (without building an external app/script) was with a cursor.

    Thanks for the article just the same.

    First, you probably haven't seen many of the posts which use cursors where they aren't needed, and yes they do look very similar to the cursors in the article.

    Second, developers used to writing in VB, VB.NET, C, C++, C#, and other languages that find themselves writing SQL (T-SQL in this case) quite frequently start using cursors, for just the reasons mentioned above, it fits their thought processes. Until they start thinking in sets, cursors do seem easier to write even with all the extra work needed to set them up. Some of these developers may learn quickly that set based processing works better, but others sometimes have to be dragged into the concept of set based thinking kicking and screaming.

    Third, if you still have access to the cursor code you mention above, how about posting it? Let's see if we can find a performant set-based approach to the problem.

  • This example is based on actual SQL code written by a developer (who shall remain anonymous) who was exceptionally proud to have figured out this entire routine on his own. However, he needed my help because although this routine did successfully list every customers name when he ran it in Query Analyzer, he could not get it to return any data to his application program.

    This immediately brings to mind what would you do if you WANT to actually print the data and not return it as a result set? I don't have a scenario in mind, just the question...

  • aaronxramirez (4/27/2009)


    Just stepped in for a quick read up on the article...good article, however the examples are a bit naive and are not applied toward anything I would have ever thought someone might use a cursor for - why would anyone use a cursor to loop over a table when the same result is created with a simple select statement.

    Although you may not think of doing something like that, others have and will. I think Barry's examples are going to get increasingly complex.

    I understand the examples are made simple to make a point, but the only time I have ever needed a cursor is to achieve some sort of lookahead in the result set, such that data in in row 1 depended on knowing the value of some data in row 10 and the only way to get that (without building an external app/script) was with a cursor.

    You can probably achieve that result without a cursor as well.

    I bet Barry will cover that later in the series.

  • john.moreno (4/27/2009)


    This example is based on actual SQL code written by a developer (who shall remain anonymous) who was exceptionally proud to have figured out this entire routine on his own. However, he needed my help because although this routine did successfully list every customers name when he ran it in Query Analyzer, he could not get it to return any data to his application program.

    This immediately brings to mind what would you do if you WANT to actually print the data and not return it as a result set? I don't have a scenario in mind, just the question...

    John,

    I ran this google search, .NET retrieve T-SQL Print Statement Results, and the top result shows how to get the results of a print statement using ADO.NET. I'm not sure when you'd use this in a client application though, unless I was writing a Query Tool to replace SSMS.

  • Excellent article. Love the simplicity of the stated cursor replacement method; look forward to seeing more of this!

    😎 Kate The Great :w00t:
    If you don't have time to do it right the first time, where will you find time to do it again?

  • Jack Corbett (4/27/2009)


    john.moreno (4/27/2009)


    This example is based on actual SQL code written by a developer (who shall remain anonymous) who was exceptionally proud to have figured out this entire routine on his own. However, he needed my help because although this routine did successfully list every customers name when he ran it in Query Analyzer, he could not get it to return any data to his application program.

    This immediately brings to mind what would you do if you WANT to actually print the data and not return it as a result set? I don't have a scenario in mind, just the question...

    John,

    I ran this google search, .NET retrieve T-SQL Print Statement Results, and the top result shows how to get the results of a print statement using ADO.NET.

    You misunderstand my question (although it would have been a a solution for the annoymous developer mentioned). How could you replicate the anonymous developers results (i.e. text in Messages tab in QA/Management Studio, but NO resultset) without using a cursor?

  • aaronxramirez (4/27/2009)


    the only time I have ever needed a cursor is to achieve some sort of lookahead in the result set, such that data in in row 1 depended on knowing the value of some data in row 10 and the only way to get that (without building an external app/script) was with a cursor.

    [font="Verdana"]In SQL 2005 you can use row_number() to do that.

    Where I use row-by-row loops (I don't actually use cursors, but it's the same thing) is where I need to repeatedly call a stored procedure that's been designed only to take one "row" of input. I feel that's more of an issue with design rather than implementation.

    [/font]

  • Have to agree, set based should be the required standard, with using server side cursor requiring an explaination as to why a set based approach is not warranted.

    Folk legend has it that cursors were added to the ANSI standard at the request of Oracle.

    Oracle developers are addicted to cursors, as are Java / Hibernate developers.

    That's partly why they like MySQL, as it is effectively a 1970's style ISAM data structure.

    Hows that for progress!

    The more you are prepared, the less you need it.

  • Before

    CpuMsLogRdsElapsed

    47658344865124

    After

    CpuMsLogRdsElapsed

    4723654

  • Before

    CpuMsLogRdsElapsed

    4765 8344865124

    After

    CpuMsLogRdsElapsed

    47 236 54

  • I will say, one Cursor I haven't quite figured out how to eliminate, is when a vendor requires that something be done via a Stored Proc, and you want to do that something for each row in your result set.

    For example:

    DECLARE

    @Id int,

    @Rcpt varchar(10),

    @Msg varchar(200)

    DECLARE theCursor CURSOR LOCAL FAST_FORWARD FOR

    SELECT Id, Rcpt, Msg

    FROM [dbo].[NewMessages]

    WHERE SentDT IS NULL

    OPEN theCursor

    FETCH NEXT FROM theCursor INTO @Id, @Rcpt, @Msg

    WHILE @@FETCH_STATUS = 0 BEGIN

    EXECUTE [dbo].[SuperSecretSendNewMessageSP] @Rcpt, @Msg

    UPDATE [dbo].[NewMessages] SET SentDT = getdate() WHERE Id = @Id

    FETCH NEXT FROM theCursor INTO @Id, @Rcpt, @Msg

    END

    CLOSE theCursor

    DEALLOCATE theCursor

  • So, I have to agree with some of the thoughts I've seen on the board - namely, so far, the article(s) have not shown a case where I would ever have thought of using a cursor. At this point, I'm not convinced that cursors can ALWAYS be replaced by set-based processing, though I agree that they should be avoided.

    Just to stir the pot, let me present you with a situation in which we are using a cursor here at my company - a situation where I cannot see a useful way to eliminate the cursor.

    So, we receive a data file from an external source on a regular basis. This file contains roughly 1.5 million rows, give or take. We load this file into a staging table, and then run a process on it. This process is a cursor, which opens a transaction, applies pieces of the data to various tables, and then marks the row as "complete" (by setting a field on the row) and commits the transaction. If any error is encountered while attempting to process the row, all the processing for that row is rolled back, the error message is written to an "error_msg" field on the row, and the row is marked as "in error". The cursor then moves on to the next row.

    The purpose of this is that if a single row is in error, the specific problem is identified, and the rest of the data is not held up by the problem. The problems could be of various kinds - problems in the data file that was sent to us, rows or pages in the target table(s) locked by a user at the moment, etc. This could allow us to only have to reprocess a few rows, rather than the entire 1.5 million rows, which takes a while. Also, nailing down WHICH row caused the issue when processing as a set is VERY difficult.

    So, the gauntlet is down - any suggestions on how to achieve the same benefits using set processing?

    In summary, we want:

    - If there is an error in a row, none of the various updates/inserts that are done should be kept - all should be rolled back.

    - Any rows which have no error should process completely, committing changes to the database.

    - We should be able to tell after the fact which rows processed completely, and which rows have an error, as well as the specific error each row experienced (they could be different for each row in error)

  • greatheep (4/27/2009)


    So, I have to agree with some of the thoughts I've seen on the board - namely, so far, the article(s) have not shown a case where I would ever have thought of using a cursor. At this point, I'm not convinced that cursors can ALWAYS be replaced by set-based processing, though I agree that they should be avoided.

    Just to stir the pot, let me present you with a situation in which we are using a cursor here at my company - a situation where I cannot see a useful way to eliminate the cursor.

    So, we receive a data file from an external source on a regular basis. This file contains roughly 1.5 million rows, give or take. We load this file into a staging table, and then run a process on it. This process is a cursor, which opens a transaction, applies pieces of the data to various tables, and then marks the row as "complete" (by setting a field on the row) and commits the transaction. If any error is encountered while attempting to process the row, all the processing for that row is rolled back, the error message is written to an "error_msg" field on the row, and the row is marked as "in error". The cursor then moves on to the next row.

    The purpose of this is that if a single row is in error, the specific problem is identified, and the rest of the data is not held up by the problem. The problems could be of various kinds - problems in the data file that was sent to us, rows or pages in the target table(s) locked by a user at the moment, etc. This could allow us to only have to reprocess a few rows, rather than the entire 1.5 million rows, which takes a while. Also, nailing down WHICH row caused the issue when processing as a set is VERY difficult.

    So, the gauntlet is down - any suggestions on how to achieve the same benefits using set processing?

    In summary, we want:

    - If there is an error in a row, none of the various updates/inserts that are done should be kept - all should be rolled back.

    - Any rows which have no error should process completely, committing changes to the database.

    - We should be able to tell after the fact which rows processed completely, and which rows have an error, as well as the specific error each row experienced (they could be different for each row in error)

    Unfortunately, it's a broken guantlet. We can't do anything without requirements, DDL for the tables (staging and destination), sample data (in a consumable format -- see the first article linked in my signature block below), processing rules for the data, expected results based on the sample data provided. If part of the "errors" is data already existing in the destination table(s) be sure to provide sample data for them as well.

  • I often work with a very similar issues. We do not use cursors (EVER). I'm hoping that the 3rd article address these kinds of issues. I will say that the 'logic' requires row based operations but you use set based commands.

    John

  • greatheep (4/27/2009)


    So, the gauntlet is down - any suggestions on how to achieve the same benefits using set processing?

    In summary, we want:

    - If there is an error in a row, none of the various updates/inserts that are done should be kept - all should be rolled back.

    - Any rows which have no error should process completely, committing changes to the database.

    - We should be able to tell after the fact which rows processed completely, and which rows have an error, as well as the specific error each row experienced (they could be different for each row in error)

    [font="Verdana"]This is a prime candidate for a set-based approach, either in T-SQL or SSIS. (I work in a SQL Server based data warehousing environment, so we're doing this sort of thing all the time.)

    If I can make a suggestion: please post this as a separate thread to the board, along with the additional information that Lynn requested, and post a link to that thread back to this. That way we can avoid derailing the conversation here.

    [/font]

Viewing 15 posts - 46 through 60 (of 316 total)

You must be logged in to reply to this topic. Login to reply