February 17, 2009 at 10:26 am
I understand there's a pretty big passion when it comes to the whole cursor versus set-based argument.
I had a programmer come to me a few moments ago with a question and I need some help.
I'm no programmer but I'm trying to get better to understand all facets of SQL Server.
The developer needs to take a set of around 100,000 rows and then go through the rows and execute a stored procedure based upon what is in each row. They intended to use a cursor, but their question was whether it would be better to do the cursor at the application level or at the server level. The idea that they had was the cost of doing 100,000 small calls to the database versus one large 100,000 row cursor once at the database level.
I hope that makes sense. I wasn't given a ton of info so I may need to ask more questions.
MCITP, Database Administrator
A hodgepodge of Information Technology and Life
LinkedIn Profile
My Twitter
February 17, 2009 at 10:35 am
Just import (like bcp in) everything on the server and process there. Now assuming it's something like insurance claims where the logic is just complexe, the cursor might be unavoidable... or just too hard to avoid.
Try set based on the server first, if you have all the requirements, I'd like a crack at it ;).
February 17, 2009 at 10:38 am
The only time I saw the reverse process is when there was just massive amounts of processing to do... then it made sens to export everything to desktop apps and have then hack at it with cpu at 100% for a couple days. Then just reimport the results back in.
February 17, 2009 at 10:45 am
I am going to assume you have a recordset on the SQL server already and this is the 100,000 rows you are talking about. I don't think you have a text file you are trying to upload or anything.
First, MSSQL would perform best overall by eliminating the cursor altogether (if possible). There are a few things that cannot be done in a set-based way, but ensure you are not using a cursor where you don't have to. 100,000 rows is not a lot, but when you have to loop through them, things can get slow quickly.
Next, if you are using a server-side cursor (the entire process in a stored procedure) you will eliminate a lot of network traffic and handshaking. Typically, less-layers involved means better performance. Now, if on the client-side cursor you were to thread the process and run each row with your stored procedure without waiting for the previous one to finish, you may see a significant performance increase. Doing this would mean you have to establish multiple connections, but that may be ok even though there is more overhead. Of course, your procedure will have to be written to avoid blocking, deadlocks, and interference with parallel execution.
February 17, 2009 at 11:01 am
All in all... post the rest of the specs :).
February 17, 2009 at 11:52 am
Here's the question from the developer's mouth:
I have a process in a .net application that needs to extract a set of records, approximately 100,000, from a SQL table. It then needs to process each extracted record individually to retrieve some additional information and perform validities on the record. This process involves additional calls to other tables.
The two options being considered are:
1) Loading all the records into a dataset within the .net application and then process each record, calling stored procedures for each record.
2) Extracting the records within a stored procedure and then within that same stored procedure sequentially accessing the record (by a cursor) and performing the additional database calls.
Which is more efficient, having the .net application make many simple stored procedure (selects only) calls or a more process intensive stored procedure with cursors, selects and processing logic that is only called once?
MCITP, Database Administrator
A hodgepodge of Information Technology and Life
LinkedIn Profile
My Twitter
February 17, 2009 at 12:22 pm
[font="Verdana"]My suggestion would be as follows:
The reasons cursors are "bad" is that they reinforce the "one row at a time" type of solution. It's not the cursor as such which is the problem. It's doing everything one row at a time.
Now, your developer has mentioned they want to go through the rows, and run some validation against the rows. Can you suggest that they write a stored procedure to validate the entire set of rows (with some criteria), and return all of the failures, all in one go?
Both of the two approaches you mention are moving the issue, not resolving it. One may be better than the other (oh, by a factor of even two times quicker, perhaps.) But if you move the validation back into the database and do it as a set, you can look at performance gains in the order of 100 or so (I've seen better than that, in fact.)
[/font]
February 17, 2009 at 2:54 pm
That's just basic ETL task... which is all done in sets.
The server can handle that unless the logic is extremely complexe (where it could take months to duplicate).
But since it's already written, please post the validation code. I'm sure we can do it set based, on the server.
February 17, 2009 at 3:07 pm
It's not already written. She came to me asking the question of what would be best.
MCITP, Database Administrator
A hodgepodge of Information Technology and Life
LinkedIn Profile
My Twitter
February 17, 2009 at 3:24 pm
I would just turn around and ask her to tell you why it HAS to be one row at a time. If it seems to come across that that's what she is comfortable with (versus a compelling reason requiring it to be done one row at a time)- then it's time to steer her in the set-based direction.
I've seen "simple validation" like that implemented in a cursor, and the cursor implementation takes hours to run. Switch it to set-based - and the thing finished in 4-5 seconds.
It might not be within her comfort zone, but the performance difference is usually staggering.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 17, 2009 at 4:30 pm
I'll toss another on the pile recommending a set based solution. This seems like a perfect teaching/mentoring moment. At the very least she should be commended for coming to you for advice on the best possible solution rather than just hacking something together that might work.
-Luke.
February 17, 2009 at 8:57 pm
Matt Miller (2/17/2009)
I would just turn around and ask her to tell you why it HAS to be one row at a time. If it seems to come across that that's what she is comfortable with (versus a compelling reason requiring it to be done one row at a time)- then it's time to steer her in the set-based direction.I've seen "simple validation" like that implemented in a cursor, and the cursor implementation takes hours to run. Switch it to set-based - and the thing finished in 4-5 seconds.
It might not be within her comfort zone, but the performance difference is usually staggering.
That's spot on, Matt! Damned near took the words right out of my mouth!
Jason, no one in your office is asking "WHY???" WHY do the rows have to be processed individually? If the answer is because another stored procedure was written to process only one row at a time, then it's time to roll up the proverbial sleeves and redesign that pup to work in a set based fashion. Note the "Set Based" patently does NOT mean "All in one query".
Will the cursor work? Absolutely. But what happens if scope increases (other than the company getting very rich) and you suddenly increase from a 100,000 rows to a million or two rows?
The time to plan to handle the big stuff is when stuff is still small enough to handle. And, please don't let anyone tell you that it "can't be done without a cursor". While that is sometimes true, it's only in 0.01% (one in ten thousand) of the cases (if that). The usual problem is that people just give up too early because they can't think in columns instead of rows.
I'll add my own personal example... I rewrote a cursor based dupe check that took 24 hours to sometimes fail and it ran against 62 tables of 4 to 8 million rows, each table living in a separate database. When I got done with the setbased rewrite, it did the desired 93 tables instead of just 62, only takes 11 minutes to run, and hasn't failed even once in the nearly 2 years that it's been in service.
Avoid the loop... especially for something like what you're talking about, Jason.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 17, 2009 at 9:08 pm
Luke L (2/17/2009)
I'll toss another on the pile recommending a set based solution. This seems like a perfect teaching/mentoring moment. At the very least she should be commended for coming to you for advice on the best possible solution rather than just hacking something together that might work.-Luke.
I've gotta agree with that... reward the messenger... always.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 17, 2009 at 9:14 pm
thanks for the tips guys.
I'll be looking for resources that will give good examples to her.
If you have any feel free to throw them my way.
It will mainly be changing the cursor (RBAR) mindset but I've always told them that a cursor should be the last resort.
MCITP, Database Administrator
A hodgepodge of Information Technology and Life
LinkedIn Profile
My Twitter
February 17, 2009 at 9:24 pm
Just a thought but, see if they will let you develop the procedure. It will give you the opportunity to get some of that experience and then you can share that with the developers in the future, and of course specifically with the one that came and asked the question. Sounds like a fun one!
Enjoy!
David
@SQLTentmaker“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply