January 17, 2006 at 8:27 am
Agreed with Steve. This method may work for the author but this is not the one I would choose and recommend. This is the last resort if nothing else can be improved or if you have only 5 minutes for improvement. Hardware solution for performance improvement is very costly and dangerous because the next step will be to purchase more and more powerful server instead of to improve the bad code. Based on my experience, 90% of the issues with performance are the bad code or bad architectural design. In this case, instead of changing a cursor with temp table I would look for the solution to take away looping construction completely and produce the set based code based on the date rather than based on the order. In most cases it will give you not 75% but 2 to 10 time’s improvement. Loop operation should be the last to consider especially with rows over certain number (let say 200-300 and may be even lower) or when the number of rows is unpredictable. But, in some cases the looping is the only choice! And the last memo: Performance is not only a function of elapsed time a process (stored procedure) is taking but in a lot of cases the way a process interacts with the other processes in a multi-user environment. Especially if we are talking about batch processes, data load processes and reports
Leo Peysakhovich
January 17, 2006 at 8:35 am
Why not use xp_execresultset?
E.g.,
EXEC master..xp_execresultset N'SELECT
''EXEC dbo.my_procedure '' + CAST(param_1 AS VARCHAR) + '','' + CAST(param_2 AS VARCHAR)''
FROM (SELECT param_1, param_2 FROM MyDB..MyTable) temp
',N'MyDB'
This keeps everything set-oriented, and should give you much better preformance than a cursor or while loop. In addition, the code is less verbose.
January 17, 2006 at 8:36 am
I agree w/Mr Hicklin, 75% performance improvement is highly dubious.
I would understand it if performance was perhaps 75% WORSE, though. When inserting data into a table variable, SQL will not use parallelism. I'm not aware of any other situation where this restriction occurs. So if query that populates the cursor/tablevar is large/complex enough to benefit substantially from parallelism then the author's suggested approach will perform worse than the cursor.
Using true #temp table instead of tablevar is workaround for that...then I'd expect to see same performance for both cursor & fake cursor approaches.
I see no benefit to this approach at all.
January 17, 2006 at 8:42 am
On a different note, how do you approach eliminating a cursor for "batch" inserting correlated items and the parent item uses an identity as PK (for example, inserting a batch of orders and the corresponding orderItems)?
I have two approaches, but neither is very ellegant:
Another approach is with the help of triggers, but I'm not very fond of using triggers
January 17, 2006 at 8:48 am
Yeah, thanks for that, cleared a few things up for me too.....it's amazing what can be learnt by a simple little question..
Cheers
Bob
January 17, 2006 at 9:42 am
Temp tables will suffer from the same performance problems as cursors. Temp tables increase concurrency problems. In older versions of SQL Server, the use of temp tables causes horrible problems due to locking of tempdb. The temp table method is not that great... I will post a good solution to this problem later today...
Jasmine
January 17, 2006 at 10:59 am
Surprisingly enough, when doing mass UPDATES or DELETES on very large tables, I've found that using a WHILE loop to limit the number of rows affected at one time gives a performance increase. I suspect it's mostly due to logging. As an example, I recently had to perform a massive, selective DELETE on a table with 33 million records:
DELETE FROM myTable
WHERE LastName = ''
What I found was that the DELETE took an extremely long time on my (admittedly) woefully underpowered development machine. So I added a WHILE loop to break the delete up into chunks:
DECLARE @start INT
SET @start = 0
DECLARE @end INT
SET @end = 33000000
WHILE @start <= @end
BEGIN
DELETE FROM myTable
WHERE rowID > @start AND rowID <= @start + 1000000
AND LastName = ''
SET @start = @start + 1000000
END
I agree that trading in a cursor for a WHILE loop row-by-row iterator probably doesn't provide as much benefit over a CURSOR than a good set-based solution will. And I find the author's results to be suspect, although I'll have to wait until I get home to test them.
January 17, 2006 at 11:15 am
SQL is set-based. Everything is done in sets. Almost anything (with very, very, very few exceptions) that you can do with CURSORS can be done with set-based SQL commands + a little imagination. SQL is a "declarative" language; i.e., you tell SQL what you want and the SQL engine finds the best way to achieve the desired results. This can be a strange notion for programmers coming from "imperative" languages like C++, VB, etc.
One of the best comparisons I've seen is the "cab driver" analogy. The "imperative" method is analogous to jumping into a cab and giving the cab driver detailed directions to get you to the airport ("turn left at the next light", "turn right at the corner", "stop at this light", "get off at this exit", etc.) You achieve your desired result by telling the cab driver exactly how to accomplish each and every step along the way.
With the "declarative" method you trust that the cab driver knows how to get to the airport. You tell the cab driver "take me to the airport", and assume he will find the best route. You assume the cab driver knows many things that you might not be aware of (i.e., traffic patterns at this time of day, recent accidents, weather, shortcuts, etc.) In SQL, you define your end results and the SQL engine determines the best path from point A to point Z; the SQL engine also takes a lot of factors into account which you may or may not know of, or care about.
When you use CURSORS, you're stepping back into an imperative mindset. You are telling the SQL engine how to do its job, one row at a time. This kills a lot of the performance benefit you would get by letting the SQL Server arrange and optimize the details of your request for you.
January 17, 2006 at 11:22 am
Exactly right. Trusting a TABLE variable (as opposed to a Temp. Table, or any other table for that matter) not to use TEMPDB is a fallacy. There is no specification that TABLE variables do *not* use TEMPDB. Belief that TABLE variables will not affect TEMPDB is not a valid assumption.
January 17, 2006 at 11:33 am
I have seen this approach many times and I must say that I hate it. Avoiding cursors is a good thing, but this approach should be called "roll your own cursor." It isn't cursors that should be avoided so much as it is row-at-a-time operations. If there is no alternative (and sometimes there isn't) use a cursor for heaven's sake! Particularly when it comes to large record sets a firehose cursor will outperform the temp table/while loop trick, unless you need an index on the temp table, but even then if you have to loop through the records, just use a cursor.
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
January 17, 2006 at 11:41 am
As others have mentioned, STATIC or LOCAL STATIC cursors resolve much of the performance issues seen with cursors.
I did once see an indication that non-STATIC cursors behave in a significantly non-optimal way. I was running the same basic routine with slightly different selection criteria, and using a non-static cursor. The table I was running against held several million rows; my first selection criteria should have covered many more rows than my second. However, both were taking a very long time.
When I investigated the actual numbers of records returned, the first selection criteria returned something like 15,000 records. The second returned 4. Yes, just 4. And it was taking hours to do. This has lead me to the belief that non-static cursors do something very peculiar, and their performance is tied more to the size of the tables they're using than the size of the result set.
I use LOCAL STATIC cursors whenever I need to use cursors.
Also: sometimes, readability and understandability are more important than outright speed in a query. When others will need to work with it far longer than you, or it will be used infrequently, and will need modification every time it's used, using a method that will be clearer to others to do the job can make it much easier to maintain and use in the future.
R David Francis
January 17, 2006 at 12:15 pm
I have published this technique in May 2005 (check http://www.sqlservercentral.com/columnists/aGrinberg/thearrayinsqlserver2000.asp). There is a problem with the TABLE variable. If you have a big row set, the TABLE variable will slow down performace. In this instance, a temporary table is better choice, but if the row set is very big (more then 100,000) I would recomend creating a table with a clustered index on the column with the identity property (in this case, this is going to be row pointer), usually the first column. If you have a chance - try it and you will see the difference in the speed.
January 17, 2006 at 12:41 pm
While 1 = 1
BEGIN
Select top 1
@var = var,
@othervar = othervar
from table
where var > @var
order by var
if @@rowcount = 0 break
print 'do something to ' + @othervar
END
That's as stripped down as it gets for a loop. Now isn't that easier to read than cursor code? I realize that most of the time loops aren't needed in SQL, but sometimes they are, and it should be super simple and clear. Imagine making a java or c# programmer write all that cursor code to do a loop; they would laugh in our face.
Signature is NULL
January 17, 2006 at 12:44 pm
I was working on a conversion project that required processing each record in several a tables. The process had to start and finish inside of 48 hours. The overhead of cursors took too much time and was memory intensive. Performance increased dramatically when using the approach listed here. The query plan is used again and again. The key colum should be indexed naturally for lightning fast results.
declare @id int --key column identifier
declare @rows int --rows returned by the select statement
set @id= -1 --counter is set to start one position less than smallest id
set @rows =1 --counter is set to 1 so that the query will run
while @rows > 0
BEGIN
select top 1
@id=orderid --assign value to @id
from
dbo.[Order Details]
where
@id<orderid --make sure that @id is less than column identifier
order by
orderid --ensure that identifiers are processed in order
set @rows=@@ROWCOUNT --*****IMPORTANT!!!!!!THIS LINE HAS TO BE THAT FIRST LINE AFTER THE SELECT STATEMENT********
if @rows > 0
begin
print @id --Execute stored procedure here to process @id
END
END
Calvin iI'm with you bro
Viewing 15 posts - 31 through 45 (of 296 total)
You must be logged in to reply to this topic. Login to reply