November 23, 2004 at 3:56 pm
So what's up with all the cursor code I keep finding people posting?
How many think cursors are good?
How many think cursors are OK?
How many think cursors are a necessary evil?
How many think cursors are bad?
How many hiss and recoil like a vampire facing a crucifix when they see a cursor? (That would be me)
This is really part rant and part discussion: am I the only one here who finds the use of a cursor as a complete failure of my personal programming abilities?
That is, many, many years of design and development of relational and SQL databases and I've only had to fall back on using a cursor (in SQLServer) one time. And note: (other than that one time) there was never a time when the performance would have been better with a cursor or that the business process was handled better with a cursor. Also note that, in the applications I have been responsible for, all of the business process logic has been contained in the DB (stored procs rule), all of the data transformation for warehouse loads has been done in SQL.
I can understand the newbie that comes in posting a problem using a cursor, but I have a hard time understanding those replying with solutions that use cursors: Why do those with more experience perpetuate the poor programming practice of using cursors?
Those coming from Oracle or other databases had little or no choice in the matter (e.g. lack of temp tables), but once they've entered SQLServer land (sybase or MS) why do they not embrace the new world? Perhaps it is because they're in Rome and the Romans (the rest of us) have not learned how to do things as the god (whoever that is) of relational theory intended?
This post may generate some ire, or it may generate silence, but despite my (semi)belligerent style, I hope to stimulate some discussion/introspection.
ducking and covering,
J
P.S. Yes there have been a very few times (though far less now that UDFs are available) when row by row processing was absolutely necessary. And yes, the code was faster NOT using the cursor and its overhead.
P.P.S. If I'm not run out of town with this one maybe I can start a fire with the "why are developers re-inventing the wheel by implementing their own user/security tables?" topic.
P.P.P.S. Oh yeah, that one time? It was only because I didn't have SA authority on the server.
November 24, 2004 at 3:20 am
OK I'll bite and be the first to jump in.
Cursor!! - don't make me larf, I'll do it using proper programming.
But seriously, when I first started SQL programming I'd come from applications programming where everything you do is sequential, so had difficulty getting my head around the set approach. Yes we did Sets and Venn diagrammes and all that at school but it didn't translate (in my mind) to real life.
So cursors were an easy option when faced with something difficult. I suspect that this is something that most rookie SQL programmes still find.
With experience I realise (and agree with you) that there are very few situations where you cannot acheive your aim without looking at each row at a time.
I've found that even on those occasions where a cursor is the only answer, if I step back and look at the (very) big picture I've probably got my system design a bit wrong and am doing the processing in the wrong place (i.e. should be middle-tier processing NOT at the database).
So, 1 vote against.
Sam
November 24, 2004 at 3:43 am
I've always managed to avoid cursors in an actual application, however I have used them for altering table design. Someone gave me a database with several table that had lots of fields that were all char(300) when varchar was what was required. A cursor looping through the columns saved me a lot of time, so they do have uses, but I agree with Sam that if you have to use them in an application then you should be taking a step back and looking at the design.
Regards,Iain
November 24, 2004 at 5:26 am
I wrote an algorithm that maximized profitability based on the sum of independent variables (sale items) that equaled a particular goal. For example, I want a 5% increase in my gross revenues and each sale item can fluctuate between -10% and +10%. I could just do an across the board 5% increase on each sale item. However, each sale item has different profitability. Some are loss leaders and others are very profitable. Based on historical volume I must calculate the best way to hit my goal.
In the problem above, I used a cursor because the algorithm I came up with kept accumulators as it evaluated each variable and it's effect on profitability. I can't imagine having done this with set theory. I wanted to do this on the server end to alleviate and keep my client process thin.
Since creating this process, it has gone through several generations. I now have moved the algorithm to an extended stored procedure using C++ for performance reasons.
I guess I would put it as "When complex logic comes into play for each row, a cursor may be necessary." In general, using a cursor for simply working with rows is not wise and can be done better with sets.
November 24, 2004 at 5:39 am
"P.P.S. If I'm not run out of town with this one maybe I can start a fire with the "why are developers re-inventing the wheel by implementing their own user/security tables?" topic."
Not meaning to go off on a tangent, but sometimes you do not want an end user to have ANY access to the database yet you want them to log in to your application. Through a n-tier process, the middle tier "protects and regulates" access to the database and modification of data. In this situation having a user table with roles defined, etc. is very usefull. the .NET framework really works well in this manner.
Client Application <---> Business Objects/Middle Layer/Security Layer <---> Data stores
ps - I'm not going to run you out of town...
November 24, 2004 at 8:59 am
My understanding of the conventional wisdom was that there was no difference in performance for small recordsets between set and cursor processing. In the case of small recordsets while set processing may be more elegant, cursor processing may be more readable and maintainable by other developers required to maintain your code.
Large recordsets (the tests I've seen say >10,000) should be processed with set processing unless there's no other way to accomplish the goal (i.e. the example above where unique evaluation of each row is required).
That having been said, I try to avoid cursor processing unless either 1) I can't find a set-based way to do what I need or b) a set-based process will be too obtuse for some of our less experienced developers to figure out should they have to support the app.
Finally, a request for assistance in this very area. I have to process a CSV list of integers that represent primary keys in a lookup table, then use that list to insert a set of rows into a table. I'm using some of the code that's cribbed from some SP/UFN's on the site to turn my CSV string into a table of integers. Now I need to insert a row into another table for each row in my table of integers. Does anyone know a set-based way to do that (without dynamic SQL) or am I limited to a cursor-based solution?
November 24, 2004 at 10:24 am
I'm confused on this:
CAST (SUBSTRING (I1.input_string
FROM S1.seq
FOR MIN(S2.seq) - S1.seq -1)
AS INTEGER)
in your Select statement. SUBSTRING should have a string, start and length operand. How does the FROM and FOR fit into the SUBSTRING as you've coded it? I copied all the code into QA and ran it, set up the SEQUENCE table. Everything is there but the last INSERT INTO... through GROUP BY ... won't even pass syntax checking in QA. It's yelling about the "FROM S1.seq" clause above.
November 24, 2004 at 12:21 pm
I hope you all realize there's an implied challenge in that original post which can be looked at in two ways:
1. Provide an example of a situation where the cursor is going to execute faster than not using a cursor. (The example provided in a prior reply didn't provide enough detail to prove that it couldn't be done in a faster executing set manner.)
2. If there is such a situation where the cursor is necessary (theory not proven to me as yet), define those factors which dictate the cursor as the solution. i.e. what's the simple rule for when a cursor should be used? (And should we allow: when I've got way more CPU power and RAM on my CPU and not nearly enough users to care about... well it was a quick and dirty... kinds of justifications?)
(Also note: this is not a "row-at-a-time processing is never required" statement. --nobody picked up on that implication either: that there may be faster executing row-at-a-time methods).
November 24, 2004 at 1:14 pm
I've been watching this thread with interest. I've heard the arguments that virtually ANY cursor can be re-written as a set based operation, and I guess I have to say that I'm a skeptic. I'm, without a doubt, one of the guilty parties that have posted code on this site that utilizes cursors, never application related, always database maintenance related. While I have been working with SQL for about 5 years, I was a mainframe programmer (procedural) before starting with SQL. Set based coding has been a learning experience, to say the least. And from what I'm seeing here, I may have a long ways to go (keep in mind that I'm still a skeptic!)
John, in your most recent post, you said "this is not a 'row-at-a-time processing is never required' statement". I assume from that you mean that cursors truly are evil, and that if sequential processing is required, one should never use a cursor, but step through the data using a key and a while loop. When I first read your initial post, I got the impression that might be where you were going, but also felt equally strong that it was primarily set based vs sequential. As to cursors vs while loops, personally, I don't see a lot of difference, and as a rule, if I can't see a set based solution, I don't discourage the use of a cursor. Its easier to code, easier to maintain, and I haven't seen a significant performance difference. Now, just maybe (remember, I'm a skeptic), the real issue should be that if I don't see a set based solution, I need to look longer and harder!?!?!
Steve
November 25, 2004 at 1:28 am
Finally, a request for assistance in this very area. I have to process a CSV list of integers that represent primary keys in a lookup table, then use that list to insert a set of rows into a table. I'm using some of the code that's cribbed from some SP/UFN's on the site to turn my CSV string into a table of integers. Now I need to insert a row into another table for each row in my table of integers. Does anyone know a set-based way to do that (without dynamic SQL) or am I limited to a cursor-based solution?
http://www.sommarskog.se/arrays-in-sql.html
In the meantime what about
SET NOCOUNT ON
USE tempdb
SELECT
TOP 256 Number = IDENTITY(INT)
INTO
MyNumbers
FROM
Sysobjects S1
CROSS JOIN
Sysobjects S2
DECLARE @s-2 varchar(256)
SET @s-2 = '1,,2,,,3,4,'
SELECT
RIGHT(LEFT(@s,Number-1),CHARINDEX(',',REVERSE(LEFT(','+@s,Number-1))))
FROM
MyNumbers
WHERE
Number BETWEEN 1 AND LEN(@s)
AND
SUBSTRING(@s,Number,1) = ',' AND SUBSTRING(@s,Number-1,1) <> ','
DROP TABLE MyNumbers
SET NOCOUNT OFF
If you don't want to have a separate number helper table (which isn't a bad thing anyway) and you're not having more than 256 values in your string, you can use
DECLARE @s-2 varchar(256)
SET @s-2 = '1,,2,,,3,4,'
SELECT
RIGHT(LEFT(@s,Number-1),CHARINDEX(',',REVERSE(LEFT(','+@s,Number-1))))
FROM master..spt_values
WHERE Type = 'P' AND Number BETWEEN 1 AND LEN(@s)
AND SUBSTRING(@s,Number,1) = ',' AND SUBSTRING(@s,Number-1,1) <> ','
On cursors:
Cursors are necessary! How else would you define what is good, when you don't know what is evil
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
November 26, 2004 at 11:19 am
First, I have never used a cursor in 13 years of SQL Server support - doesn't mean that they don't have their place, but I've never used one. Moreover, they are the number one source of performance with vendor-written applications. Vendors test their code against 2000 row tables but forget that some of their customers have 20 million row tables.
The separate security system debate would be more interesting.
- My users are world wide, don't all have Windows workstations, aren't in the same or trusted domains, and we want just one system.
- I can control application (virtual) and database (actual) security with one system.
- I can still do all the things Windows does (mandatory password changes, no reuse of previous 10 passwords, etc.
- In large corporations, people accountable for local security have no control over who is in Windows groups and in many cases have no way to know who has access.
November 29, 2004 at 3:06 am
OK, having read all your comments, and despite my ealier post about there practically always being a set-based solution, there is one occasion where I used a cursor. It was this:-
I wanted to create an audit trail system. I had an audit table that consisted or a date-time column, the userID, the table being updated, the operation ("update", "insert" or "delete") then 20 sets of columns called "Column1_OldValue", "Column1_NewValue" etc.
The challenge was, for every table in the database to create an insert, update and delete trigger which did an insert into this audit table, putting the pre-changed values of each column into "ColumnN_OldValue" and the updated or inserted values of each column into "ColumnN_NewValue"
This procedure ran as a schedualled task every week so that any changes to the schema were automtically included in the audit process.
The only way I could achieve this was by nested cursors that queried the system tables so that (1) I could work with each table in turn, and (2) I could create a dynamic INSERT statement using each specific column name that was in the table.
Can anyone think of a way that this could have been done using a set-based solution. This is only out of interest as I now no longer work for the company that used this process.
Sam
November 29, 2004 at 3:15 am
Okay, this is an "administrative" one-time action. I think that's quite okay for such cases. The only alternative I can think of right now, might be sp_MSforeachtable. But if you look at the code behind that procedure guess what you find there. I believe this to be the reason why it is undocumented.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
November 29, 2004 at 11:29 am
I, like the last poster, come from a mainframe, procedural background, so I automatically think that way.
Recently someone came to me with the need to change many employee IDs, in a table, from one given unique value to a different, given unqiue value. This ID is not the key to the table. I had them give me a spreadsheet with the key values, old ID, new ID. I imported that to a work table and ran the following. Can this be done by set-processing?
DECLARE ConvertID CURSOR FOR
SELECT workkey, workold, worknew
FROM worktab
OPEN ConvertID
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE @holdworkkey INT
DECLARE @holdworkold CHAR(6)
DECLARE @holdworknew CHAR(6)
FETCH NEXT FROM ConvertID
INTO @holdworkkey, @holdworkold, @holdworknew
UPDATE prodtable
SET datacol = @holdworknew
WHERE keycol = @holdworkkey AND
datacol = @holdworkold
END
November 30, 2004 at 9:05 am
Thanks Joe. I tried that and it worked great. Yes, I needed to leave the old key unchanged if there is no input value to change it to.
I had never used Coalesce. Correct me if I'm wrong, in English this is saying - update the table, setting datacol to either the results of the Select, or to it's existing value, whichever is not null. Is that right?
Viewing 15 posts - 1 through 15 (of 34 total)
You must be logged in to reply to this topic. Login to reply