January 21, 2006 at 11:03 pm
Ok I have to ask. Through reading I have learned that Cursor; BAD. Ok, so I don't use cursors. For one thing they are a pain in the ass, compared to the alternatives, to program. To that end word on the street is that they are terrible with performance and should be avoided. Ok great; pain in the ass and they are bad to use. For once in life something that is bad for you is undesirable.
Moving along Temp tables are easy to deal with and Variable tables are very easy to deal with.
In my daily life I use Variable tables all the time. I use them whenever I am dealing with a small data set (about 5,000 records or less; depending upon the number of columns and such). Anything above this I typically turn to Temp Tables.
I find that this approach works well and I have not come across any limitations. For my own education however I would like to know WHY are cursors so bad? Is my use of other non-physical tables considered best practice? If the methods I list here are preferred then why do cursors exists? Is there a time that I should consider using them? Would my life be much easier, in some cases, if I were to use a cursor?
Thank you in advance for your reply. Please only reply with sound facts rather than opinion as I am looking for a solid factual basis behind what I have adopted as methods.
<hr noshade size=1 width=250 color=#BBC8E5> Regards,Jeffery Williams http://www.linkedin.com/in/jwilliamsoh
January 22, 2006 at 12:52 am
Jeffrey,
As you are probably aware (you ARE an experienced CIO), any RBAR (a "Modenism" for "Row By Agonizing Row", pronounced "ree-bar") is ALWAYS slower than an equivalent set-based operation. Cursors are an inescapable form of RBAR. And, so far as I'm concerned, creating a temp table (or table variable) and stepping through it with a While loop is just as slothful and ineffecient a manner of SQL processing as a Cursor. My feeling is, any form of RBAR means that the Developer has failed in his/her understanding of what an RDBMS is all about.
That, not-with-standing, what makes them so bad? From my readings and personal experience, it's the extra locks they sometimes place on data and the differences in concurreny. As you suggested, please don't take my opinion on it... lookup "cursor, locking" (multiple entries here including "Concurrency", etc... they all apply) in Books OnLine and take a peek at the following URL which is chock full of info about locking and concurrency of cursors, as well. Then, you'll be equipped to come to you own conclusions instead of the millions of zealots and SQL "clones" that are likely to answer this post of yours...
http://msdn.microsoft.com/library/en-us/acdata/ac_8_con_07_633r.asp?frame=true
--Jeff Moden
Change is inevitable... Change for the better is not.
January 23, 2006 at 7:05 am
Jeff,
Good answer. Concurrency is certainly an issue together with the row locking. I had considered locking however did not give any thought to concurrency issues.
On the other side of the coin I would disagree with your analysis of "My feeling is, any form of RBAR means that the Developer has failed in his/her understanding of what an RDBMS is all about". I am interested in why you see it this way? I suppose it would matter why these methods are being used in the first place.
Do you have an example of when things like Temp tables, variable tables are being used and under a proper design they would not have been needed? What about a situation where there use is appropriate?
I ask as I do use them. I don't use them on a daily basis however I run across many situations where; at least from my perspective, they are needed. As an example reading in a set of staging data. I might grab a set of 1,000 records that are in a pending state; perhaps waiting to be rated. Rate the records and write back a new status together with the additional data. In using a variable table I can accomplish this as a single read / single write. Additionally comparisons between some of the records, perhaps even joins are all done when the records are read in; further increasing the efficiency.
As a result of load (Amount of new data being written to these staging tables) and or record completeness (Parts of the data are coming from diverse sources and not always at the same time) the records cannot be mediated / rated at the time of insert.
Re: “Then, you'll be equipped to come to you own conclusions instead of the millions of zealots and SQL "clones" that are likely to answer this post of yours...” This is precisely why I wrote this post. While I have my own ideas behind answering my own question I wanted to hear from some people in this group. It seems, and it is probably the case with any field, that there are lots of drones out there. You know the situation; I heard XXX about YYY so this is the way I do it, and then they tell two people and so on.. Before you know it there is this mass group all doing XXX. They don’t really know why it is just the way it is done; or is it? Like I tell people there are lots of ways to get something to work. Typically there is only one or two right ways. I was hoping to either validate the methods that I have employed through trial / error / performance testing, or learn something new. In part of I have been told that if you must use some sort of ‘temp’ table that a cursor should be avoided in place of Temp tables or variable tables. The problem is nobody has been able to answer why? Why is a cursor so much worse and on the other side what makes a variable table so much better? Locking and concurrency are a couple of good reasons.
<hr noshade size=1 width=250 color=#BBC8E5> Regards,Jeffery Williams http://www.linkedin.com/in/jwilliamsoh
January 23, 2006 at 7:23 am
Jeffery - Sounds as though you have found a good reason for using a loop rather than folding the loop into the SQL.
But are you sure that these 'Parts of the data are coming from diverse sources and not always at the same time' cannot be placed into a temp table first, and thus included in the SQL with the rest of your data.
I occasionally use loops, either for Monitoring or for debugging (or both). For instance we have some processes which can literally take days to complete and we need a clear and easy to read log of progress, so we use a loop in this manner:
Create Cursor for all outstanding jobs
Open cursor
While Cursor
Report start of job
Step 1 SQL
Report Step 1 success
Step 2 SQL
Report Step 2 success
etc
Report Job end
Check for STOP status and exit if necessary
Loop
This way we can always see what is going on and even stop cleanly if we need to modify something.
For debugging, I use loops ocassionally where I need to find the exact piece of data that is causing an error to occur ( and thus save myself time searching through thousands of rows). In these cases I would break out a loopable item from a dataset, modify the SQL to use it as a parameter, and then run the SQL in a loop.
(Marvin)
January 23, 2006 at 7:25 am
Oh - one other thing about cursors - don`t forget you can make them more efficient by using Fast Forward Only and the like.
(Marvin)
January 23, 2006 at 8:29 am
Giles,
Yes I am sure that the data is being propagated from various sources and not at the same time. Additionally in some cases, like the example I gave; we are talking about rating phone calls. This is rating at the wholesale level and the rate or discount applied varies depending upon call volume. This is yet another reason why this process happens at end-of-period rather than real-time.
<hr noshade size=1 width=250 color=#BBC8E5> Regards,Jeffery Williams http://www.linkedin.com/in/jwilliamsoh
January 23, 2006 at 8:46 am
January 23, 2006 at 9:24 am
Set based operations, from my experience, are always much more efficient than row based operations using loops of any kind. I would recommend not using cursors period. Any other row or loop based operations should only be used what a set based statement cannot be created. In saying this, I do need to state that I have used temp tables and table variables at times when I could not come up with a set based method.
As an example of how you could change your thinking from row/loop to set based thinking, let's consider the Rating example that you've given. It looks like you are using a table variable to store your recordset and then looping through your table variables recordset to update columns based on your rating methodology. Set based thinking would, instead of using a loop to process one row at a time, use one or a group of update statements to implement your ratings. So instead of getting one row, using conditional logic to determine what rating to give it, and then updating the rating, you would write an update statement and write your conditional checks into the WHERE clause. This may get you a fairly complex looking WHERE clause, but from what I've found, if you compare the costs of these 2 methods, Set based will win the vast majority of the time.
January 23, 2006 at 2:05 pm
January 23, 2006 at 9:57 pm
Jeffrey,
First, I say "you" a lot in the following write-up... I don't mean you personally and hope you don't take it that way (personally). Your questions are outstanding and instead of sugar coating an answer, I just wrote this as if we were talking face to face over a really good cup of coffee.
You wrote:
On the other side of the coin I would disagree with your analysis of "My feeling is, any form of RBAR means that the Developer has failed in his/her understanding of what an RDBMS is all about". I am interested in why you see it this way? I suppose it would matter why these methods are being used in the first place.
Let’s tackle that first. In the "old" days, we kept our data in files. The traditional method of processing records was to read a row, process a row, then write the row out. Or, maybe even something more serious like accumulating data for averages and sums, but it still boiled down to read a row, process a row. This is called "ISAM" or Indexed Sequential Access Method and is still a time-honored method for processing data… just not in a database.
In those "old" days, the next thing we were usually taught right after the classic "Hello World" problem, was how to create a loop to count from 1 to some number, displaying each number as we went. People tend to actually think in such a serial manner because it’s easy and step by step. Take a number, add 1 to it, display the result, check and see if we’re done. Like I said, it’s easy. It’s so easy that RDBMS’s like Oracle have a huge number of tools to support this serial thinking.
Then, along comes a different breed of databases that support "set-based" thinking. Instead of saying "Write a routine to count from 1 to 10", we now have the power to simply say "Write a routine to produce a set of all numbers from 1 to 10" and it’s very fast. Still, being creatures of both habit and serial thinking, we doggedly continue to write such routines to count instead of produce. Because we can’t think in the set-based paradigm, we continue to let our serial minds control how we program. We have failed in our understanding in what an RDBMS is all about… the set-based processing of huge amounts of data in a set-based fashion.
Soooo… let’s get back to our original problem of counting from 1 to some number… Let’s revisit how a good number of people would do it for counting from 1 to a million… and, to keep from contaminating the results with varying display speeds, we’ll just populate a table with the numbers…
On my humble desktop machine, that little bit o’ code produced the following stats:
Seconds approximate elapsed time
Seconds approximate CPU time
Total disk reads and writes
KB approximate memory usage
Not bad for a nice tight little loop… but if I quit here, I will have failed to use the power of the RDBMS to its full extent and because of that, I will have failed as a Developer . Let’s think about it… I don’t really want to count from 1 to a million… I want to produce a result set from 1 to a million… here’s the set-based equivalent of what we just did…
…and here’s the stats…
Seconds approximate elapsed time (39 seconds LESS)
Seconds approximate CPU time (34.1 seconds LESS)
Total disk reads and writes (73 reads/writes LESS)
KB approximate memory usage (8.1 KB LESS)
No loops, is nasty fast, makes good use of several of the tools available in the RDBMS including a predictable and controlled cross-join, and actually takes fewer resources to boot!
Yeah, yeah… here they come… you know them… they’re the ones that say "It doesn’t matter how fast something runs so long as it runs correctly and the data is protected." Horse-muffins! The also say "It's not whether you win or lose, it's how you play the game." Obviously, they didn't win the game.
The above code is just a tiny example of the speed you can get out of the RDBMS and still do it right and still protect the data. And, if you don’t think speed is important, just wait until the first time your boss says "I want you to run the XYZ job and you can’t go home until it finishes" and the bloody thing takes 8 hours to run! You’ll be wishing for the speed, then, I guarantee! But you won’t get the speed using ANY form of RBAR!!!
I told you that long winded story so I can tell you this one...
The good news is, there’s nothing that can’t be done in a good RDBMS without RBAR. Oh sure, I’ve used WHILE loops to control the number of passes on a table , but each pass was for millions of records, not one painful record per loop. Let me say it again, if you use RBAR methods, you’ve failed to understand how to use an RDBMS which means you’ve failed as an SQL Developer.
You also wrote:
Do you have an example of when things like Temp tables, variable tables are being used and under a proper design they would not have been needed? What about a situation where there use is appropriate?
I may have confused you when I said that replacing a Cursor with a Temp Table and WHILE loop is a bad thing (and it still is a bad thing). I can see where you might think that I don’t believe in the use of Temp Tables. And, that’s not true at all. I’m a big believer in the proper use of Temp Tables to store interim results. I’ve turned many an 8 hour run into a 20 minute run by using these wonderful scratchpads known as Temp Tables. I’ll also say that using Temp Tables inappropriately is the worst thing you could do and replacing a Cursor with something else that does RBAR (Row By Agonizing Row), Temp Table or not, is a bit like having an orgy to support an abstinence movement. As fun as it might be and for all the good you think it may be doing, it just isn’t going to work.
For Table VARIABLES, the only place you should use them is in a function that requires a temporary structure that looks like a table, and then you better keep it small … forget about all the myths about how good Table VARIABLES are… they’re just not true. However, instead of taking my word for it, check out the following URL… particularly questions 2, 3 and 4…
http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k
The one’s I really like is about how Table VARIABLES can’t use statistics (essential for speed) and that both Table VARIABLES and Temp Tables both start out in memory and spill over to TempDB when they need to. Clearly, Temp Tables are head and shoulders above Table VARIABLES for performance and usability. The only reason I ever use a Table VARIABLE is because you can’t reference a Temp Table in a UDF.
You also wrote:
In part of I have been told that if you must use some sort of ‘temp’ table that a cursor should be avoided in place of Temp tables or variable tables. The problem is nobody has been able to answer why? Why is a cursor so much worse and on the other side what makes a variable table so much better?
The use of Cursors or Temp Tables in conjunction with a WHILE loop seems, well, stupid to me. A "firehose" cursor is just about as effective as a Temp Table/WHILE loop combination, so it’s nothing more than a useless but good-feeling patch to change it from one to the other... makes people feel like they did something smart. What I’m here to tell you is that Cursors and any form of RBAR WHILE loop should be rewritten in a set-based fashion. If that requires you to store interim sets in a Temp Table, I’ve got no problem with that… just don’t do any of it one agonizing row at a time.
And that was my whole point in the original post... Just changing from one form of RBAR to another is mostly a waste of time. The change from Cursors to Temp Table/WHILE loops might get rid of some locking issues, some concurrency issues, and maybe run a tiny bit faster but it's still a very bad thing to write RBAR... if you're looking for a quick fix instead of doing it right, then just change the Cursor to a firehose cursor and stop wasting your time...
--Jeff Moden
Change is inevitable... Change for the better is not.
January 24, 2006 at 10:25 am
You 'da man Jeff. That is an awesome post. I have been attempting to get some of our developers to understand the importance of writting set based code over cursors and temp table/while loop combos. This is a great post and I can't wait to distribute it to our staff.
January 24, 2006 at 10:34 am
Jeff,
That is an amazing post. Thank you for the detail. I am going to play with some of your suggestions and compare my results. It all makes sense however.
<hr noshade size=1 width=250 color=#BBC8E5> Regards,Jeffery Williams http://www.linkedin.com/in/jwilliamsoh
January 24, 2006 at 5:12 pm
John and Jeffrey,
Thank you both for taking the time to read the post and understand it Don't be too disappointed if you can't convince people to give up the RBAR habit right away... it's a tough habit to break and many will insist that "it can't be done that way". It took me almost a year to convince management at my place of work that they needed to invest in rewriting some of the code to be set-based. What finally convinced them is when I (on my own time) took an archiving job that took almost 8 hours to fail on a single table and made it so that it ran successfully on 6 tables in less than 40 minutes (each table is between 4 million and 100 million rows). The other thing was when I took a customer transfer process that took 30 minutes and it also blocked the database for 10 minutes 3-times a day company wide and got it to run in 4 seconds (3.91, actually).
You'll likely need to demonstrate the same thing... it would be my sincere pleasure to help if they need another comparison.
Thanks again for the great compliment! You both made the effort very worthwhile
--Jeff Moden
Change is inevitable... Change for the better is not.
January 24, 2006 at 8:11 pm
I am one of 'THOSE' that has that tough habit to break. I have learned something new. I really enjoy programming in SQL, and pride myself on Efficient database design (and programming). This has uncovered an area that raises question to my practice. I will be spending some time learning this process and become proficient with this new skill set. In comparing my SQL to that of most others I come across it is often superior. With the dramatic performance variance that you illustrate here it really raises question to some of my own practices and particularly that of those I have compared against in the past; my hat is off to you. I will gladly take this knowledge to heart.
<hr noshade size=1 width=250 color=#BBC8E5> Regards,Jeffery Williams http://www.linkedin.com/in/jwilliamsoh
February 7, 2006 at 6:05 am
Jeff, I agree with your discussion about row-by-row processing, however I disagree about temp tables vs. table variables. My points are:
1. I don't need statistics on temp tables, because I usually know enough about their structure and data that will be populated to force optimal execution plan.
2. The main drowback of table variable is that you can't create additional index except for primary key. However, more often then not I don't need additional indexes.
3. Temp tables cause sp recompilations, so in high traffic environment where sp may be executed N x 1000s time in hour sp recompilations affect significantly cpu usage.
The conclusion: Both table variables and temp tables are "good", you apply one based on problem in hand.
Viewing 15 posts - 1 through 15 (of 28 total)
You must be logged in to reply to this topic. Login to reply