October 18, 2008 at 2:34 am
Dear Sir,
I need a help regarding the aboved subjec. I can't clear my self after reading a sql server book.
Please help me regarding the above subject. what is the cursor and Why we use the Cursor and what is utility and why we use while @@fetch_status=0 in this case
Thanking You
Debasis Bag
October 18, 2008 at 3:18 am
They're used to do row-by-row processing. In general, that's a very poor way to do things in SQL, as the database engine works better on sets to rows than multiple single rows.
Personally, the only thing I will use a cursor for is administrative tasks (like backup DB, checkDB) that have to operate on one item at a time.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 18, 2008 at 10:59 am
deba_20032004 (10/18/2008)
Dear Sir,I need a help regarding the aboved subjec. I can't clear my self after reading a sql server book.
Please help me regarding the above subject. what is the cursor and Why we use the Cursor and what is utility and why we use while @@fetch_status=0 in this case
Thanking You
Debasis Bag
I agree with Gail... cursors are to do the ol' ISAM thing of read a row, process a row, write a row... which is not the way most databases are built to work and that makes them slow.
Essentially, a cursor is nothing more than creating and populating a temp table, and then reading 1 row from the temp table, doing some process, and then incrementing a counter all in a WHILE loop which also is RBAR (see my signature line below for what that means).
What makes a cursor particularly bad is, unless you identify it as a READ ONLY, FORWARD ONLY cursor, it can have hooks into the actual underlying tables themselves and that can cause some pretty nasty blocking. Even if it doesn't, it's still going to be very slow compared to "set based" programming.
About 99.99% of the time, there is some set-based solution for the problem (whatever it is). The problem is finding the set-based solution... most people just can't get over the idea of processing "Rows". In databases not specifically designed to handle ISAM (Indexed Sequential Access Methods), processing row by agonizing row is a terrible and certain performance and scalability killer.
Even when, for certain DBA procs that must traverse multiple databases or multiple tables and the occasional proc to traverse multiple customer databases, that a cursor seems like it should be used, a cursor should never be used to process a single row. Rather, it should be used to control what happens to multiple SETS. Even then, with the advent of VARCHAR(MAX) and some high speed dynamic SQL/command concatenation, there's no real justification for ever using cursors or even WHILE loops.
The paradigm shift from RBAR/ISAM methods to Set-Based programming is actually pretty simple... instead of thinking about what you want to do in each row, think about what you want to do to each COLUMN in a SETof data. Sure, there are some "tricks of the trade" like using a Tally table or ROW_NUMBER() to pull off some code to solve what looks like needs to be RBAR, but they're well worth taking the time to learn.
The really hard part about making the shift is that you need to understand that just because something doesn't have a WHILE loop in it, IT STILL MIGHT [font="Arial Black"]NOT [/font]BE SET BASED. A classic example of were this happens is when people go to make a "Running Total"... what they normally come up with is a correlated subquery with an inequality in it and the number of internal rows that it spawns makes a cursor look like the only way to do such a thing. Patently, that's not the case. "Recursion" is another form of hidden RBAR that will kill performance... it works out to be nothing but a loop with no WHILE statement. Depending on how they are written, UDF's (User Defined Functions) can also take perfectly good set based code and just throw it on the floor turning perfectly good code into a hidden RBAR performance nightmare. Another form of hidden RBAR is Cross Apply, which is nothing more than a correlated subquery. If it has a UDF in it or an inequality, you need to test it very carefully for performance because it can qualify as RBAR on "sterioids" making it even worse than a cursor. They're not all bad, but do understand that they are ALL a form of hidden RBAR and must be used very carefully and tested for both performance and scalability.
Are cursors and WHILE loops bad? No... for the right job, they are sometimes (like 0.01% of the time) the only way to do something in SQL Server 2000 and less... the problem is that most people can't make the paradigm shift to think in COLUMNs instead of rows and will, too quickly, resort to some form of RBAR because it's easier for them to understand and, therefore, quicker for them to write which usually makes the boss happy because of "time to market" and other poorly planned schedules. They never consider how bad performance will be nor how unscalable the solution will be when they get lazy and go for the RBAR solution. Heh... and saying that "the table will never contain more than 1000 rows" is NOT justification for using RBAR... despite anyone's best intentions, either the table WILL get bigger or someone will use your canned RBAR code as a programming model for a larger table.
For the record, I've never put a cursor into production and I almost never use a WHILE loop except to control how SETS of information are processed. The only place where I've ever actually used a real RBAR WHILE loop is to prepare a relatively static hierarchy for using the "Nested Set" method of set based lookups.
There's a lot of ways to avoid almost all RBAR although some hidden RBAR does creep in to even the best of set based programming. Some of those methods for some common tasks can be found in the articles I've written and they can be found at the following link...
http://www.sqlservercentral.com/Authors/Articles/Jeff_Moden/80567/
The bottom line is, almost never write explicit RBAR like cursors and WHILE loops and be very, very careful to consider what may constitute hidden RBAR like UDF's, Recursion, Cross Apply, and Correlated Subqueries. And, to answer the original question, for the most part, and with very, very few extraordinary exceptions, cursors have NO utility in SQL Server. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
October 18, 2008 at 11:51 am
Jeff Moden (10/18/2008)
The bottom line is, almost never write explicit RBAR like cursors and WHILE loops and be very, very careful to consider what may constitute hidden RBAR like UDF's, Recursion, Cross Apply, and Correlated Subqueries.
The optimiser's pretty good with correlated subqueries these days. I was doing some tests for a presentation a month or so back, and I was looking for an example where a correlated subquery performed differently to an equivalent derived table. (execution plan, stats time, stats IO). I couldn't find a single example where the correlated subquery actually was executed per row of the outer query.
Unless, of course, I've misunderstood what you mean by a correlated subquery.
Simple example (based off adventureworks)
select productId, productnumber,
(select SUM(LineTotal) SumTotal from Sales.SalesOrderDetail sd where sd.productID = p.productid ) SumTotal
from Production.Product p
option (maxdop 1)
select p.ProductID, ProductNumber, SumTotal
from Production.Product p
left join (
select productid, SUM(LineTotal) SumTotal from Sales.SalesOrderDetail group by ProductID
) SalesTotals on p.ProductID = SalesTotals.ProductID
option (maxdop 1)
I've added extra rows to the SalesOrderDetail , so my results are probably different to anyone elses. (The maxdop 1 is there because both tend to parallel and I don't want added complexity.)
Correlated subquery:
Table 'Worktable'. Scan count 0, logical reads 0
Table 'Product'. Scan count 1, logical reads 4
Table 'SalesOrderDetail'. Scan count 1, logical reads 18316
SQL Server Execution Times:
CPU time = 1922 ms, elapsed time = 2052 ms.
Derived Table
Table 'Worktable'. Scan count 0, logical reads 0
Table 'Product'. Scan count 1, logical reads 4
Table 'SalesOrderDetail'. Scan count 1, logical reads 18316
SQL Server Execution Times:
CPU time = 1922 ms, elapsed time = 2064 ms.
There are 504 rows in the Product table, so if the correlated subquery was running once for each row of the outer, I would expect to see Sales Order Detail with a scan count of 504 and a logical reads of over 9 million for the first query.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 18, 2008 at 5:04 pm
Good information above.
When you fetch data from a cursor, you're grabbing a row. That returns a status code (OK, end of cursor, etc) and you check fetch_status to determine if you've read the last row.
October 18, 2008 at 5:30 pm
GilaMonster (10/18/2008)
The optimiser's pretty good with correlated subqueries these days. I was doing some tests for a presentation a month or so back, and I was looking for an example where a correlated subquery performed differently to an equivalent derived table. (execution plan, stats time, stats IO). I couldn't find a single example where the correlated subquery actually was executed per row of the outer query.
Indeed, the optimiser is great with correlated subqueries... if they're written correctly, which your's are. Heh... I've been known to use correlated subqueries myself. But, if you check back in what I wrote, here's what I said...
"what they normally come up with is a correlated subquery [font="Arial Black"]with an inequality in it [/font]and the number of internal rows that it spawns makes a cursor look like the only way to do such a thing"
Your's don't have an inequality in it. The first one is basically just a misplaced, pre-aggregated, derived table. Try doing, say, a running total using a correlated sub-query with an inequality in it and you'll find that example of "where the correlated subquery actually was executed per row of the outer query".
In fact, here's an example that you can run...
SELECT sod1.SalesOrderDetailID,
sod1.ProductID,
sod1.ModifiedDate,
sod1.LineTotal,
(SELECT SUM(sodrt.LineTotal)
FROM sales.SalesOrderDetail sodrt
WHERE sodrt.SalesOrderDetailID <= sod1.SalesOrderDetailID) AS RunningTotal,
(SELECT COUNT(sodrc.LineTotal)
FROM sales.SalesOrderDetail sodrc
WHERE sodrc.SalesOrderDetailID <= sod1.SalesOrderDetailID) AS RunningCount
FROM sales.SalesOrderDetail sod1
WHERE sod1.SalesOrderDetailID <= 10000 --Limit the experiment so it actually finishes sometime this week!
ORDER BY sod1.SalesOrderDetailID
Notice the limit I put on it so that it actually finishes sometime this week... 😛
Now, if you look at the "Estimated execution plan", you'll see some super skinney 1 row arrows. Those mean that each correlated sub-query will be executed once for each row and are just a tiny hint of the [font="Arial Black"]HUGE [/font]performance problem that awaits this code. If you change the limiter I put in the code above to only 500, everything looks hunky dory and the code actually "seems" to run pretty fast. In fact, it only takes about 3 seconds. That's where most people wrap things up and say they're done... and have left a time bomb in their code... (FIRST indication is that it should NEVER take 3 seconds to return a lousy 500 rows) 😉
Change the limiter to 1000 and run it again... wow... only a second more for twice the data... we must've written some really good code to only take a 3rd longer on twice the data. Um... but look at the "Actual Execution Plan"... where'd those arrows with 500,500 rows come from? Bah... doesn't matter, right? Look at the times. We wrote some excellent code! 😉
Run the code with the limiter set to 10000, like I originally posted... let it run. It'll take almost a minute and a half... what happened to only taking 10 times longer? Something is drastically wrong... and what do those arrows with 50,005,000 for a rowcount mean? Hmmm... suddenly our code isn't looking so good and [font="Arial Black"]it's the fault of correlated subquery with an inequality [/font]in it (like I said in my original post).
Finally, try running it with the limiter set to 20,000 rows (just twice as big as before) and go take a short nap. 😛 When you come back, you'll find that it took [font="Arial Black"]4 times [/font]longer than 10,000 rows and that, internally, it generated more than 200,010,000 rows for each correlated subquery. The side advantage is that you won't have to run a heater in the winter time... your CPU(s) will take care of that!
Another name for correlated subqueries with inequalities in them is a "Triangular Join"... they're a bit more than half as bad as a full Cartesian Join and you can read more about them at the following link...
http://www.sqlservercentral.com/articles/T-SQL/61539/
By any other name, improperly written correlated subqueries are HIDDEN RBAR and that's why I said that you have to "be very, very careful to consider what may constitute hidden RBAR like UDF's, Recursion, Cross Apply, and Correlated Subqueries." :w00t:
--Jeff Moden
Change is inevitable... Change for the better is not.
October 18, 2008 at 5:41 pm
Debasis,
That seems to bring us right along with what I was talking about and back to your question about the "utility of cursors"... how WOULD you do a running total if this "seemingly-set-base-but-still-RBAR-on-Steroids" isn't the way to do it? Well, yes, you could write a cursor or WHILE loop to do it and actually get it done thousands of times quicker on a million rows. Or, you can bust a hump and find a set-based way to do it... on a million rows... in 7 seconds. See the following...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
Like I said, 99.99% of the time, there is NO utility for cursors in SQL Server. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply