July 3, 2007 at 3:54 pm
Hi, Forum!
I'm optimizing a SQL 2000 DB, and there are a lot of cursors of different sizes and some use small tables (<500 rows) and others use big tables (>1,000,000 rows).
I was wondering if you have some general tips for eliminate or at least mitigate the effects of these cursors.
Thank you!
July 3, 2007 at 7:30 pm
Very hard to say... people normally use cursors when they think there is no set-based option... normally they are wrong. Replacements will many times consist of the ill-conceived notion that a temp table and a while loop are effective replacements... again, wrong because that's just about what a cursor is and will normally only achieve a 5 to 10% gain in performance.
The very hard part will be trying to figure out what the cursor actually does and coming up with a high speed set-based solution because a lot of folks think a set based solution only avoids the loop. That's not always true especially when folks try to use a correlated sub-query or a triangular join (WHERE someID <= someotherID) to replace the cursor.
Then "general" advice that I would give identifies the "paradigm shift" between cursor based thinking and set based thinking... stop thinking about what you can do to a row... start thinking about what you can do to a column. Set-based thinking is about columns, not rows.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 4, 2007 at 2:23 am
General tips? Well, especially if cursors are used as often as you describe, be very careful. Some of the other functionalities, that don't include any cursor themselves, may rely on row-by-row processing. A good example of that may be a trigger on a table that is updated only by application, through cursors... such trigger can be written in a way that does not support multi-row processing. It can even happen (speaking of my own experience with third party software), that the trigger has a special branch for handling possible multi-row update, but this branch does not work - nobody noticed that before, because all updates were row-by-row.
Benefits of the shift from cursors to set-based processing can be immense and are well worth the work you invest into it. But, as Jeff said, only if you can come up with a GOOD set-based solution, which is not always possible (or at least not easy) in environment that was originally designed for cursors. So, test, test, test... and compare the results.
July 4, 2007 at 7:56 am
Thanx Jeff & Vladan.
One of the main problems that I have is that, as Vladan states, many SP's do inserts and updates on tables with triggers so I'm a doing this with with a lot of caution (and a little bit of fear hehehe).
I guess the hardest part for me is trying to understand someone else's logic and trying to figure out why he (or she) code the SP's that way.
Well, wish me luck and thanks again!
July 4, 2007 at 11:22 am
I guess the hardest part for me is trying to understand someone else's logic and trying to figure out why he (or she) code the SP's that way. |
Oh yeah... you hit the nail on the head... this is why good documentation in the code is so important... doesn't necessarily have to be complex or long documentation to be good documentation, but if it's missing, everything from simple maintenance to major rewrites becomes a whole lot harder.
(Sorry for the length of this post , but it seemed like a good place to say all of this )
I just got done with a cursor conversion at the following URL (original cursor on previous page) if you want to check one out... dunno how much faster it actually is because I couldn't test it... didn't have the data.
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=338&messageid=377881&p=2
Just remember "hidden" code in triggers like Vladan pointed. If it were me, the trigger code is the first thing I'd convert from RBAR (pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row") to good set based code so I wouldn't have to worry about it ever again. Then, I'd find out what the most frequently used code is, and convert in that order.
And, converting cursors to setbased is usually a real bugger so don't get discouraged!!! Remember that you're trying to figure out (usually without any documentation) what some code does that was complex enough to stymy some other developer bad enough to make them think the only way to get it done was to resort to RBAR as either a cursor or a loop.
But, it's worth it because, in the long run, speed IS important... what do you think the CPU or hard-drive is doing during long running code? One or the other or both is being used extensively... would you rather have it take 2 minutes to run or 2 seconds to do the same thing? Would you rather have a 10 to 24 hour batch run, or a 23 minute run?
Probably the best advise to writing set-based code is that which I've already stated... think of what you need to do to a column of information, not a row. The other advise is that shorter code will not always be faster code. For example, a single select with joins to 12 tables may not be nearly as fast as getting the "base" rows of the select and storing them in a temp table and then updating or using the temp table as a greatly reduced rowcount of "driver rows" to power other individual Updates, Inserts, or Selects. In other words, "Divide and Conquer"... just like in any other code. SQL is no different.
The other very important thing is, you don't know what you don't know... if you don't know all the tools available in SQL Server (functions, techniques, etc) and the ability to (sorry about the cliche`) "think outside the box", you're gonna be like a lot of folks... you'll "give up" and use some form of RBAR to solve problems instead of figuring out the high-speed set based solution. Take time to study ALL the functions in SQL... you don't have to memorize them... just know kinda what they do so you can look them up when you want to think outside the box.
And, even some code that supposedly requires a "loop" to be done, can be done in a high speed set-based fashion. I guess a simple demonstration would be in order here... I appologize for the really stupid-simple example, but this is one of my favorites to demo set-based thinking...
Problem: Create a table with 1 million random numbers between the values of 1 and 10 with the following exceptions: every 5th number shall be the number 100 and every 10th number shall be the number 200. The "count" must be stored in the table for both verification and "selection/join" purposes. The "1 million" count must also be programmable through a variable so we can turn this into a stored procedure if required in the future.
Guess I'm obliged to demo the "loop" version that most folks would come up with first... this is row base thinking... and it takes a comparitively long time to run... many folks would solve this problem using a While loop because, as everyone knows, the use of RAND() produces the exact same number throughout any given query so it just can't be done setbased... right? Here's the While loop code... takes almost 50 seconds to run on my machine and locks the CPU at 100% for the entire duration...
--===== Setup the environment SET NOCOUNT ON --Suppress the autodisplay of rowcounts USE TempDB --Change to a database where we can't do any harm for the demo
--===== Declare and set a variable to measure the run duration DECLARE @StartTime DATETIME SET @StartTime = GETDATE()
--===== Declare some obvious named variables DECLARE @Counter INT DECLARE @MaxCount INT SET @MaxCount = 1000000
--===== Create the test table -- If the test table already exists, drop it IF OBJECT_ID('jbmTestRandomCursor','U') IS NOT NULL DROP TABLE jbmTestRandomCursor
-- Create the test table CREATE TABLE jbmTestRandomCursor ( TheCount INT NOT NULL, TheNumber INT )
--===== Solve the problem using the fastest WHILE loop I can come up with SET @Counter = 1 WHILE @Counter <= @MaxCount BEGIN --===== Calculate the random number output for this count INSERT INTO jbmTestRandomCursor (TheCount, TheNumber) SELECT TheCount = @Counter, TheNumber = CASE WHEN @Counter % 10 = 0 THEN 200 WHEN @Counter % 5 = 0 THEN 100 ELSE CAST(RAND()*10+1 AS INT) END
--===== Increment the loop counter SET @Counter = @Counter + 1 END
-- Add a named primary key based on TheCount column ALTER TABLE jbmTestRandomCursor ADD CONSTRAINT PK_jbmTestRandomCursor_TheCount PRIMARY KEY (TheCount)
--===== Display the run duration PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Duration in milli-seconds' GO
Ok... with a little bit of extra knowledge about some SQL functions, some knowledge about IDENTITY columns, and some knowledge about cross joins, here's the "set based" solution... does exactly the same thing but in about one-fifth of the time... yep... it also pegs the CPU... but which would you rather have... a pegged CPU for nearly 50 seconds or a pegged CPU for just over 10 seconds???
--===== Setup the environment SET NOCOUNT ON --Suppress the autodisplay of rowcounts USE TempDB --Change to a database where we can't do any harm for the demo
--===== Declare and set a variable to measure the run duration DECLARE @StartTime DATETIME SET @StartTime = GETDATE()
--===== Declare some obvious named variables DECLARE @MaxCount INT SET @MaxCount = 1000000
--===== Create and populate the test table with a list of sequential numbers all at once -- If the test table already exists, drop it IF OBJECT_ID('jbmTestRandomSetBased','U') IS NOT NULL DROP TABLE jbmTestRandomSetBased
-- Limit the number of rows to build SET ROWCOUNT @MaxCount
-- Create and populate the test table with sequential numbers, on the fly SELECT TheCount = IDENTITY(INT,1,1), TheNumber = CAST(NULL AS INT) INTO jbmTestRandomSetBased FROM Master.dbo.SysColumns sc1 WITH (NOLOCK), Master.dbo.SysColumns sc2 WITH (NOLOCK)
-- Return to normal unlimited rowcounts SET ROWCOUNT 0
-- Add a named primary key based on TheCount column ALTER TABLE jbmTestRandomSetBased ADD CONSTRAINT PK_jbmTestRandomSetBased_TheCount PRIMARY KEY (TheCount)
--===== Update the TheNumber column in the test table with the correct random numbers UPDATE jbmTestRandomSetBased SET TheNumber = CASE WHEN TheCount % 10 = 0 THEN 200 WHEN TheCount % 5 = 0 THEN 100 ELSE CAST(RAND(CAST(NEWID() AS VARBINARY))*10+1 AS INT) END
--===== Display the run duration PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Duration in milli-seconds'
In the While loop code, we were solving for everything that would live on any given row. In the setbased example, we first worried about how to create the TheCount column and then we worried about how to get the correct random numbers into the TheNumber column. Since we couldn't populate TheNumber column until we had TheCount column populated, we had to do it in 2 steps instead of one. But, it's still 5 times faster then the While loop. Like I said, shorter code is not always faster code.
This is actually a BAD example for showing how much faster set based can be than cursor based because this example is only 5 times faster. My latest conversion at work was on a dupe check/dupe removal/dupe archive process across 30 different databases on common tables that contained more than 4 million rows each (1 month back) for each database. The cursor based solution took 10 to 24 hours to run depending on how many times it crashed due to blocking, timeouts, and other factors.
The setbased solution I wrote goes 3 months back (90 daily 3rd party databases total) and only takes 23 minutes... that's 3 times the work and is still 20 to 60 times faster (so, 60 to 180 times faster according to the work done?)
Remember these when trying to do cursor conversion to set based:
Hope this helps...
--Jeff Moden
Change is inevitable... Change for the better is not.
July 4, 2007 at 11:43 am
Here's a much shorter version of the same idea. I once had a trigger that was updating a column in a table RBAR style (select the row id, then update 1 column in the base table where Id = @id). On a 10K rows insert statement, the code took 45 mins to run. In the set based version of the trigger : 0.1 sec. So that's around 25 000 times faster. This may be on the more extreme side of things but you still get the idea .
July 5, 2007 at 1:32 pm
Thank you for all your comments, they are very helpful and they've helped me to get started with this (frightening =P) task.
Thanks again guys, and sorry for the delay in my response.
July 5, 2007 at 6:23 pm
Heh... if you run into a "toughy", c'mon back... lot's of good folks waiting in the wings...
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply