December 29, 2006 at 11:25 am
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/awarren/2785.asp
January 15, 2007 at 3:02 am
Derek Slinn
January 15, 2007 at 3:23 am
I think it's always difficult to get this point across about cursors and how transactions work! I totally agree ( with Andy ).
Sorry to burst the bubble but a cursor and a while are the same RBAR ( thanks Rudy ) a while is a cursor in disguise.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
January 15, 2007 at 4:37 am
A statement like "There is no excuse for ever using a cursor" is a very bold one.
I have a reasonably complex (50 tables) but not large (400Mb) database running behind a busy web site..
Once a week it mails a list of updates to subscribers based on a range of criteria.
I have one stored procedure that retrieves the records from 9 tables and formats them (using a cursor and fetch).
I have a 2nd stored procedure that contains a Mailing program (CDO) and accepts "To", "Subject", "MailBody" arguments from the 1st SP.
I used Andy's "If you can't think of any other way of doing it" criteria before going the cursor route and I believe my solution is valid and as efficient as any other method in achieving the objective. It can send a mail to around 1,000 users in 35 seconds.
One major plus to this approach is the fact that when Microsoft updated CDONTS to CDO a while back and the system failed, I knew which SP needed attention and didn't have to try to pick out the CDONTS code from the Query and Formatting code (why does code make perfect sense when you write it, but look like a meaningless string of characters within a few days?).
I am not sure I could have achieved all this without a Cursor. Other people may have been able to, but not me.
Cheers
CP
January 15, 2007 at 6:15 am
If you really, really, can't find a way to do it in SQL other than using a cursor, stop right there, give your head a shake and pass the problem on to the application. Sequential operations are much more efficiently processed outside of SQL, if they cannot be solved using set based queries.
For example, the mail merge issue was one we encountered as well, and so we retrieve all the data and hand it off to the middle tier, where sequential operations can be processed much more efficiently and with much greater flexibility. SQL is a set based language and is designed to be efficient at such operations, not at sequential operations. We can handle upto 100K emails easily, the biggest bottleneck being the mail server itself, not the code.
XML is one of the best things since sliced bread for DBAs. It avoids having to use cursors to process arrays.
Proper and efficient use of dynamic tables is another skill good DBAs need to understand and utilize, and can be a very efficient way to avoid using cursors.
In SQL 2005, CTEs and several other new features all are useful at avoiding cursors.
I have not used cursors, in SQL, since 6.0 and when I have caught junior DBAs and developers using them, I cuff their ears and show them how to code SQL.
January 15, 2007 at 6:28 am
I am a developer who has always been drilled with the "cursors are the spawn of the devil" mantra... Of course, there are also people who say "Windows is evil", or "Mac OS is best", or whatever. My point here is not to start a holy war, but to point out that the intelligent developer is able to take a critical look at all of the tools that are available to them and realize that each tool has the job it handles best. Just because I think hammers are bad, that doesn't mean I should use a screwdriver to pound in a nail.
Case in point, I had to create a process (DTS Package) for a client that pulls data in from a few tables in their legacy system via ODBC. It also pulls data from an Access database. It scrubs, combines, reformats, and normalizes this data into tables in my SQL2000 database. I suppose it would have been possible to come up with a set-based approach to this (at least, once the data was brought into staging tables) and it would maybe have run faster. However, execution speed was a very low priority when compared with getting the project done within the deadline.
As for the notion of dealing with the iteration in the application, rather than in the database, well, that simply isn't always possible. In the case I mentioned above, the DTS package is the application. Besides, sometimes it is really important to encapsulate business logic in the database and not require application developers to be aware of it.
Don't get me wrong, I ALWAYS try to avoid cursors if I can manage it within the time contraints of a given project. But, saying that cursors are bad and they should never be used is something of an idealistic statement. It sounds really good until you have real clients with real budget and time constraints.
January 15, 2007 at 6:30 am
Colin, I totally agree about the while loop = cursor. I haven't measured the true overhead of each, but I would be surprised if the cursor wasn't cheaper in some cases since the while loop is re-running the qualifying query once for each iteration.
I see I didn't make my argument quite convincing enough!
I don't agree with moving code to the middle tier just because I need a cursor. There are definitely times when you need better string handling (though that is reduced with .Net code availability in 2005) or other rich tools, but too many times I've seen that work that gets moved back translate into 100k network calls, one for each row. Doesn't have to be that way of course, but its easy for developers to loop and call a proc thinking 'hey Im using stored procedures Im the man!'.
As for XML, I agree it makes a decent pseudo array. If it fits, great.
So, to those who disagree with me and spout the cursors are evil line - have you gone back and really proved that case to yourself, or is just dogma that was beat into you by a senior DBA back when you were learning to write selects?
Thanks for reading the article and the comments so far.
January 15, 2007 at 6:30 am
I was interested by Colin's remark that a WHILE loop is a cursor in disguise. What are the comparative timings?
Best wishes,
Phil Factor
January 15, 2007 at 6:36 am
Well, at least Phil read the article!
January 15, 2007 at 6:59 am
Nice article Andy. I think it is always good to challenge long held "well everyone knows that" type of knowledge. I agree that cursors are not evil and are often the best solution, regardless of overhead. The extra overhead doesn't matter if you have the processing power available in the desired time frame.
I also respectfully dissagree with the gentleman who wants to move the process to the application. In my case, more and more of the application is in the database. The application is only in play when a user browses to the application which is not as conducive to scheduled processes as using SQL only. Again, it is relative to your needs and situation.
Thanks again Andy. You always bring good perspective.
Hugh
January 15, 2007 at 7:05 am
If you wrote a proc, usp_DoBatch...
psuedo-code
start tran
set @i = 1
while @i <= 10000
begin
If @i % 100 = 1
start tran
Do Something
If @i % 100
commit tran
set @i = @i + 1
end
commit tran
end
Would this still cause the t-log to grow because of the outer tran, as if we just did one big trans? Or does this allow the t-log size to be manageable? I'm guessing not...
Interesting article though Andy. I find cursors useful when i need to massage data from an existing table structure to a new structure.
Thanks,
Brian
Think great, be great!
January 15, 2007 at 7:09 am
Yes, outer tran would hurt you. You replace it by just building in some type of restartability (or planning to do it if something breaks). As long as the internal stuff is committed or rolled back, you could restart the loop again if needed without much headache. Note that the same applies to cursors. Just using one doesnt change the affect on the transaction log, its just a looping construct (with some extra features).
January 15, 2007 at 7:22 am
I'm not dogmatic about not using cursors and will use them purely for convenience on occasion. However, in the example in the article the outer cursor was used just to keep you from having to rewrite your code to handle multiple rows at a time. I would consider that a bad use of a cursor. You are creating a huge amount of calling overhead and eliminating the benefit of using SQL in order to reuse code that handles a single row when, according to your description, you will never (or rarely) need to handle a single row. If there is concern about batch size and tlog size then you can handle that in your SQL code as well.
JimFive
January 15, 2007 at 7:25 am
DTS & SSIS packages can eliminate the looping activity - they may not be as efficient as a stored procedure but they sure can accomplish a lot.
I like cursors in many situations. Other than infinite loops, what problems do cursors present?
January 15, 2007 at 7:30 am
SQL Cursors are very useful and powerful because they offer a high degree of row-level data manipulation, but this power comes at a price: negative performance. In simple cases there is no performance differences but with nested cursors it will be.
In addition, if cursor will not be closed for whatever reason process will start leaking memory.
In Microsoft SQL Server 2005, the SELECT statement in a cursor definition is subject to the same transaction locking rules that apply to any other SELECT statement. In cursors, however, an additional set of scroll locks can be acquired based on the specification of a cursor concurrency level.
The transaction locks acquired by any SELECT statement, including the SELECT statement in a cursor definition, are controlled by:
These locks are held until the end of the current transaction for both cursors and independent SELECT statements. When SQL Server is running in autocommit mode, each individual SQL statement is a transaction and the locks are freed when the statement finishes. If SQL Server is running in explicit or implicit transaction mode, then the locks are held until the transaction is either committed or rolled back.
Performance is not only a function of elapsed time a process (stored procedure) is taking but in a lot of cases the way a process interacts with the other processes in a multi-user environment. Especially if we are talking about batch processes, data load processes and reports. They can run for a long time with multiple processes in parallel. And different technique methods can be used to minimize time and improve the way processes affect users and/or other processes.
It means that by using while loop and a temporary table the process may perform better with the other processes interaction.
http://www.sql-server-performance.com/dp_no_cursors.asp
http://msdn2.microsoft.com/en-us/library/ms190211.aspx
http://www.sqlservercentral.com/columnists/lPeysakhovich/overviewofperformance.asp
Viewing 15 posts - 1 through 15 (of 81 total)
You must be logged in to reply to this topic. Login to reply