There Must Be 15 Ways To Lose Your Cursors... part 1, Introduction

  • Thomas (4/16/2009)


    As I said, cross joining to sys.columns is more than just about thinking set-based. It is about using the system tables in a clever way. If a developer were populating a list of numbers using a cross join of dozens of different production tables would you not find this odd (and worrisome)?

    [font="Verdana"]No, I'm not missing the point at all. See my earlier comments about using "the right tool for the job" and where row by row approaches sometimes are the right tool (although the cases I find for using row-by-row are fewer.)

    What I'm saying is that where a set-based approach is the right tool, it should be used. Saying "oh, it's too complex" is wrong.

    If you object to using system meta-data tables, do you also object to using system libraries in languages such as VB and C#? It's much the same concept. Part of learning to use a database is learning the concept of meta-data (after all, DDL is basically a meta-data language.)

    Yes, it's still possible to get it wrong (cross joining dozens of different production tables is a good example.) I would find it more worrisome if they had a cursor across those tables!

    So it's all about learning the tool and how to use it correctly. You can do insane things in any language, and SQL is no exception.

    [/font]

  • Thomas (4/16/2009)


    It is clear that you gentlemen are missing the point. That developers should strive to refine their craft is obvious. No one disputes that. That set-based solutions are preferred in database operations is also obvious. There is no great revelation there.

    I think for some there might be a great revelation there. Keep in mind that this site is frequented by developers and DBAs of all levels of experience. The fact of the matter is that the little things that a developer with 10 years of experience takes for granted could be a brand new concept to someone who just started in the field last Tuesday. I was reminded of this just the other day on a different issue, when I pointed out something I've been using for years to a junior developer--you would have thought it was Christmas morning.

    I think it's obvious that RBarry's target audience is junior developers who may not understand what all the fuss about "set-based" versus "cursors" is about, especially developers parachuting into SQLand from C++ville or the Republic of VB.

    On the other hand I also think it doesn't hurt the more experienced professionals to be reminded every once in a while.

  • Right Mike. And as the series goes on we will get to increasingly complex, difficult cases, so at some point it should reach a level where everyone can learn something new.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thomas (4/16/2009)


    It is clear that you gentlemen are missing the point. That developers should strive to refine their craft is obvious. No one disputes that. That set-based solutions are preferred in database operations is also obvious. There is no great revelation there. What is being missed is that:

    A: There are times, which are admittedly more rare these days, when a set-based solution performs worse given the current platform and version than a cursor where you have more control over the resources in use. There are host of reasons why this might happen but it does happen.

    B: There are times, which hopefully we all will avoid, when you are required to execute logic on each row in a table (sorry, various columns in given resultset) in such a way that the current set based operations are incapable of handling. That scenario, again in my travels, has been very rare. Mostly, I have avoided such situations by putting less business logic (or none) in the data layer.

    There are many ways of doing things set-based some are more intuitive than others. As I said, cross joining to sys.columns is more than just about thinking set-based. It is about using the system tables in a clever way. If a developer were populating a list of numbers using a cross join of dozens of different production tables would you not find this odd (and worrisome)?

    Heh... that's a heck of a switch in gears from the "intuitive" stance you took just a couple of posts ago.

    I don't believe those points are being missed at all. In fact, I absolutely agree with what you said above and I believe that you'll find that most of the others agree. But, if you try to justify cursors to satiate the lack of knowledge of developers, you'll continue to get extreme resistance to your arguments. The whole purpose of the article is to educate and you keep fighting with the "teacher".:-)

    Also, what does "populating a list of numbers using a cross join of dozens of different production tables" have to do with anything? As you said, most developers don't even know how to create a list of numbers using just one cross-join so it's not even remotely likely that they'll use "dozens". 😉

    I've not seen the rest of Barry's series on the subject... but, knowing Barry, good things will come to light in the rest of the series that will serve as both revelation for newbies and interesting points of discussion for the veterans. Part of the point of this series is that there are a huge number of things that cursors are inappropriately used for and the one of the aims is to show folks ways around those places. And, if a simple introduction, such as this one, can draw so many posts from both newbies and experts, just imagine how many "lurkers" there are that are learning not only from the article, but from the hayday of activity that follows on the discussion post.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • RBarryYoung (4/16/2009)


    Right Mike. And as the series goes on we will get to increasingly complex, difficult cases, so at some point it should reach a level where everyone can learn something new.

    Absolutely. Case in point -- a couple of the junior devs on my team saw your article before I did and they came running to me -- "Did you see that article about cursors on SSC?!" They thought it was very entertaining and I appreciate the fact that it reinforces the concepts I try to get across to them about set-based programming and best practices on a daily basis (though I must admit I do it in a far less entertaining fashion since we're up against tight deadlines). Personally I'm going to use your series to reinforce these concepts to my junior team members.

  • Great article! I have been practice declarative programming for years. It is definitely a different approach compare to procedure programming. As some previous post, using this (declarative) approach to rewrite (to get rid of the cursors) for performance can really make someone SHINE (I did it several times). You deserve that song!

    I am thrilled to see many solutions offer for a simple problem as "there are many ways to skin a cat".

    These remind me the "shoot yourself in the foot" joke.

    I am a big fan of the declarative programming. However, in some case, I must amid that having a cursor make the solution more "readable", and hence, more maintainable. Another problem arise is to get rid of the cursor will need to introduce dynamic SQL, which could be a worse devil than cursor.

    I wonder how to justify the cost of getting rid of cursor altogether. Will part 2 address that? I could not wait to see part 2...

  • Thanks, Ted!

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • ted quan (4/16/2009)


    Great article! I have been practice declarative programming for years. It is definitely a different approach compare to procedure programming. As some previous post, using this (declarative) approach to rewrite (to get rid of the cursors) for performance can really make someone SHINE (I did it several times). You deserve that song!

    I am thrilled to see many solutions offer for a simple problem as "there are many ways to skin a cat".

    These remind me the "shoot yourself in the foot" joke.

    I am a big fan of the declarative programming. However, in some case, I must amid that having a cursor make the solution more "readable", and hence, more maintainable. Another problem arise is to get rid of the cursor will need to introduce dynamic SQL, which could be a worse devil than cursor.

    I wonder how to justify the cost of getting rid of cursor altogether. Will part 2 address that? I could not wait to see part 2...

    Hi ted,

    Hope you don't mind if I jump in. You can often justify eliminating cursors just through performance. In SQL 2005+ the readability issues are greatly diminished through CTEs, windowing functions, new operators, and even CLR.

    When you mention dynamic SQL it sounds to me almost like you might be using cursors for automating administrative tasks? I have a personal rule of thumb for cursor usage - (1) when the data sets are small and (2) when the order of operation is important. Administrative tasks tend to fit these rules well, for instance when you need to perform administrative tasks on indexes for 20 tables and they have to be done in a very specific order.

    For me, if I had code that was too complex to be performed in declarative SQL then I'd have to wonder if SQL was the right place to try to perform it. There are too many other languages and tools (SSIS comes to mind) that excel at procedural/RBAR processing. (apologies to Mr. Moden) 🙂

  • Heh... that's a heck of a switch

    in gears from the "intuitive" stance

    you took just a couple of posts ago.

    Actually, it goes directly to my point. Unlike DBA purists, developers do not see the world nearly as black and white. I'm arguing against the "teacher" making blanket claims like "you should never need to use cursors" or "the only reason people use cursors is because they do not know the set-based approach." Such wild claims are specious.

    In very rare cases, using a cursor for clarity can make sense specifically because most developers can easily understand them. Someone mentioned the example of not using objects in C#. A better analogy would be the use of a clever solution with anonymous methods. I did this very thing many months ago. After having to re-explain it to three developers and a consultant, I broke down and rewrote without the anonymous method so it would be clearer. It is possible that my revised version did not perform as well but the gains in readability and maintenance outweigh those costs unless the performance difference is prohibitive. "Clever" solutions can sometimes fit into this mold.

    If one can document the clever solution and be done, great! The numbers example is admittedly a poor one because it would never get into production code. I asked if a set-based solution existed and indeed it did. I learned something new! However, if I did see a join to something in the master database in production code, I'd have to stop and investigate: do they actually know what they are doing or is this is mistake? That is why the example of populating a list of numbers from a dozen different production tables is applicable. If you saw something like

    Select Top 1000000000 Into Numbers

    From ProductionTableA, ProductionTableB...

    Would you not be at least a little suspicious? Similarly, if I saw a join to something in the master database I'd be a little suspicious. That you'd stop to investigate means that it is less intuitive. Cost obtuseness is not entirely in the eye of the author; it is in the eye of the person that takes over your code.

  • Mark (4/14/2009)


    neufeldb4 (4/13/2009)


    Allow me to make some observations.

    ...

    3. ... The first is that it does nothing

    The code returns the square of the number of rows in master.sys.columns, trivial I know but I wouldn't call it "nothing". The obvious set based re-write is just

    Select sum(1) From master.sys.columns c1 Cross Join master.sys.columns c2

    or more usually

    Select count(*) From master.sys.columns c1 Cross Join master.sys.columns c2

    Or more efficiently (less I/O)...

    select count(*)*count(*) from master.sys.columns

    Comparison...

    (1 row(s) affected)

    Table 'syscolpars'. Scan count 2, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    (1 row(s) affected)

    Table 'syscolpars'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Derek

  • If you saw something like

    ........Select Top 1000000000 Into Numbers

    ........From ProductionTableA, ProductionTableB...

    Would you not be at least a little suspicious?

    /*I couldn't resist jumping into this conversation here. I actually think you have something of a point in some of your posts, but this point rings false to me. I would not be suspicious because I would "never" allow code in my database that wasn't documented. The comment above your sample code would clearly say what was going on and why. No confusion or suspicion or need to think even. That's routine for all my code.

    I believe that self-documenting code is a myth. Documenting is appropriate just about all the time. It is not something one does just for "out-there" code. If you document all the time and adequately, then code such as the above code should be relatively easy for any developer past the very, very beginning level to understand.

    I document more than anyone else I've ever seen. But even people who document half as much as I and who would give some basic comment about the above code would have the concept covered/explained adequately for any review by peers. */

  • Documentation of code is tricky subject in my experience

    Yes I do comment my code and yes it is needed to some degree but there are some risks associated with it and certainly when you overdo it.

    First, natural language is not as exact, nor as efficient as code with respect to describing things accurately. Over time discrepancies will arise that can lead to confusion or misdirection. My main point here is that documentation is not a substitute for writing code in a way that expresses intent clearly.

    For example, a function call to GenerateRandomNumer is more informative and better at abstraction then some newID call plus some math/cast code to arrive at a number that is random as well. A comment just makes clear the intention of the writer, but it does nothing to make the code itself more open to analysis and understanding. Nor does it improve modular testing in case of an issue. For this very reason comments need often be ignored and you still have to read what is really going on instead of assuming comments are spot on all the time. Again comments are not a fully qualified substitute.

    Abstraction of logic into functions does not have this problem and when properly applied can even speed up the set based code. I seen this before and it is as if the optimizer does a few extra steps that it otherwise would not take.

    Just my 2c

  • Thomas (4/16/2009)


    Heh... that's a heck of a switch

    in gears from the "intuitive" stance

    you took just a couple of posts ago.

    ...In very rare cases, using a cursor for clarity can make sense specifically because most developers can easily understand them. Someone mentioned the example of not using objects in C#. A better analogy would be the use of a clever solution with anonymous methods. I did this very thing many months ago. After having to re-explain it to three developers and a consultant, I broke down and rewrote without the anonymous method so it would be clearer. It is possible that my revised version did not perform as well but the gains in readability and maintenance outweigh those costs unless the performance difference is prohibitive. "Clever" solutions can sometimes fit into this mold....

    We used to call this "coding to the lowest common denominator". Basically you can't use any features, design patterns, etc., that are above the level of the least experienced and knowedgeable people in the group. I once consulted at a company for a very short time where the lowest common denominator rule was in heavy usage. Everything was cursorized and the entire system crawled along horribly. They even used nested cursors to duplicate INNER JOIN functionality. The lowest common denominator among the development staff happened to be a C++ programmer who thoroughly understood the concept of nested loops, but couldn't quite grasp how to join the contents of two tables without explicitly declaring those loops himself.

    Rather than raising the skill level among the lowest common denominator and implementing more efficient solutions this company opted for the "easy-to-understand" solution: use WITH (NOLOCK) hints and make as many cursors as possible READ ONLY/FORWARD ONLY.

    The performance gains were on the order of slightly less than 1%.

    Where do you draw the line of compromise when you code for the lowest common denominators?

  • The article was well written and funny, but I do have one beef with it. It lacked the qualifier "When execution speed is more important than anything." Since cursors tend to be easier to write, if execution performance isn't important, or not as important as other optimizations (i.e., got to get this written ASAP), then cursors may be "better".

    For example, in the Pharma industry, there isn't that much data, plus, there are a lot of one-of's written. In that case, optimizing your code for execution speed is a waste of time. Who cares whether it takes 10 minutes or 10 hours? I need the info by tomorrow.

  • Stephen Hirsch (4/17/2009)


    . . .cursors tend to be easier to write . . .

    I think you'll get some argument on this. Cursors are easier to understand for someone coming from a procedural programming background -- like that C++ programmer Mike C tells us about in his last post. But one of the main points that I think RBarry is making in his article is that once a developer understands the mind-set of declarative programming and set-based solutions, they are often actually easier to understand.

Viewing 15 posts - 241 through 255 (of 380 total)

You must be logged in to reply to this topic. Login to reply