August 12, 2008 at 1:48 pm
I'm on the way home, I'll try to set up a test tomorrow. You can see a huge difference in only a few hundred rows. Multi-statement UDF's are just a bit problematic.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 12, 2008 at 2:00 pm
Grant you seem more stressed than Gail; the point of discussion was Gail earlier stmt:
"Just watch the performance implications. A single statement tvf is usually fine, the multi-statement ones temd not to perform well if they are returning lots of rows and are part of other queries"
and this was in answer to my prior-prior that a tvf is preferable to using the combination of a view and a controlling sp to "dynamically filter" the view.
this was the context of the discussion.
Now you really feel running a view that returns zillions of huge rows into a stored procedure that then just selects a few from that lot (and then only just a few fields from these) - this is more effective than writing a tvf that returns just what is needed?
August 12, 2008 at 2:03 pm
I was going to throw a rock at the hornets nest and run, but I will chime in. I seem to write code a lot more "loosely" when I am writing it to use as a utility script. For example the following script is full of things would generally try to avoid in production code. It brings back the inputbuffer for the spid when run in 2000. When I shared the script, one of the questions was can it be rewritten without a cursor. I am just of the mentality that if I can execute a script in a second or so that is only used every so often, I don't really worry about it. I am including the script as a "concrete" example of where I use a cursor.
CREATE TABLE #sp_who2
(
SPID INT,
Status VARCHAR(1000) NULL,
Login SYSNAME NULL,
HostName SYSNAME NULL,
BlkBy SYSNAME NULL,
DBName SYSNAME NULL,
Command VARCHAR(1000) NULL,
CPUTime INT NULL,
DiskIO INT NULL,
LastBatch VARCHAR(1000) NULL,
ProgramName VARCHAR(1000) NULL,
SPID2 INT
)
Create Table #SqlStatement
(spid int,
statement varchar(8000))
create table #temp (x varchar(100), y int, s varchar(1000), id int
identity (1,1))
INSERT #sp_who2 EXEC sp_who2
Declare @spid varchar(10)
Declare @Statement varchar(8000)
declare @sql varchar(1000)
DECLARE SpidCursor Cursor
FOR Select spid from #sp_who2
OPEN SpidCursor
FETCH NEXT FROM SpidCursor
INTO @spid
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = 'dbcc inputbuffer (' + @spid + ')'
insert #temp
exec (@sql)
Insert Into #SqlStatement
Select @spid, s From #Temp where id = (Select max(id) from #Temp)
FETCH NEXT FROM SpidCursor
INTO @spid
END
Close SpidCursor
Deallocate SpidCursor
Select B.Statement, A.* from #sp_who2 A Left JOIN
#SqlStatement B ON A.spid = B.spid
Drop Table #Temp
Drop Table #SqlStatement
Drop Table #sp_who2
August 12, 2008 at 2:08 pm
Well, I'm not hand-waving; just type "TPC cursor" into Google and you will immediately find, as I did, a post by the wonderful Linchi Shea that announced to the world a while back that *all* of the TPC-E benchmarks on SQL Server up to that point had relied heavily on cursors. And then if you go further (as in, click one link) you will find an extended discussion between the likes of Adam Machanic and Greg Linwood. It is significant to see that two guys of this stature had the guts to admit in public that the topic is nuanced. It is also significant to understand commands such as Linwood's: "I agree that cursors are measurably slower than set based processing for simple statements such as the example you've provided, the game changes as the complexity increases."
You are following the party line, but you are also simply hand-waving. We all know that there are cases, lots of them, where eliminating cursors is helpful. To claim that this proves, somehow, that cursors are always bad is simply a logical error: "if X is sometimes an improvement over Y, then Y is always wrong." That's the FUD-mongering that has become part of SQL groupthink, and to be honest it's embarrassing since database professionals are, first and foremost, logic professionals.
I know you have made lots of great contributions here, and I fully respect the time and effort you've put into doing so. However, I would humbly suggest that you may be assuming that a certain set of use cases are all there ever are, or that we can always drive business requirements back to a simplified set of SQL use cases, and that you may need to look beyond that limited set of use cases. Surely you don't believe that the best way to post required file IOs, send emails, trigger job executions, or do other intrinsically row-by-row operations that are not directly supported by T-SQL DML would always, in every case, preclude the use of cursors?
My comment about concurrency and your comments about cursors using temp tables are actually in agreement in a way. I think we all know that cursors may use temp tables or worktables - no news there. But there are some business cases where we know that we need to do "something" to a whole bunch (millions, at least) of our "entities." In those cases, one of the first choices I have to make is do I (a) iterate over the production data, possibly inducing contention in the production table, or (b) iterate over a temporary data store of some sort. If I go for (b) I can have better control over concurrency issues without having to use NOLOCK hints, etc. So then it's a question of whether I (a) dump the data into a temp table and WHILE loop over it; (b) dump the data into a table and CURSOR over it; or (c) cursor (fast_forward, thanks) over the production table. You may say that "WHILE loops are the preferred solution" as you have before, but that's begging the original question, which was: are they always?
As a sidebar, this reminds me of another debate that goes on sometimes (and another item on some people's "SQL Profanities" list) : using XML to pass in rowsets. In the past we have frequently used XML to pass in bundles of rows of data, and eaten the parsing cost. We have definitely proven repeatedly that sending in bundles of rows in one call is more efficient for us in our overall workload than calling onesy procs, even though that approach would entirely eliminate XML parsing of any kind. Still, there are people that say "that's always bad!" and I always wonder why. So there's another one for the discussion.
And I'm not making up the legal bit. I've been prohibited from posting articles on places like SSC and SSWUG without vetting them through our legal department first, but I have been cleared to work on various projects for Microsoft. So, just because it's easier, I sanitize what I say here and stick with the Microsoft projects. We may well wind up seeing each other in Redmond some day, and we can still disagree there. 🙂
Cheers,
Chris
August 12, 2008 at 2:33 pm
Here is an example of a cursor outperfoming a set based solution. I am not saying it can be done often, but I never say something can't be done.
August 12, 2008 at 2:40 pm
riix (8/12/2008)
Now you really feel running a view that returns zillions of huge rows into a stored procedure that then just selects a few from that lot (and then only just a few fields from these) - this is more effective than writing a tvf that returns just what is needed?
This thing is, the entire view won't be evaluated if only a few rows are required. Since a view is just a saved select statement, part of the process of parsing and compiling a query before it is passed to the query execution engine is replacing the view names with their definitions
So, say you have a view defined as:
CREATE VIEW MyView1 AS
SELECT Col1, Col2 FROM MyTable
then you run a query that references that view, say as follows:
SELECT Col2 FROM MyView1 WHERE Col1 = 20
The parser, as it's parsing and binding the query replaces the view name with the definition, resulting in something like this
SELECT Col2 FROM
(SELECT Col1, Col2 FROM MyTable) AS MyView1
WHERE Col1 = 20
and the view becomes internally a derived table. The optimiser will always try to push predicates down as close to the tables as possible in order to return the smallest number of rows possible, and so the predicate will very likely get pushed inside the subquery (as the two forms are equivalent), resulting in a structure something like this:
SELECT Col2 FROM
(SELECT Col1, Col2 FROM MyTable WHERE Col1 = 20) AS MyView1
And hence only the qualifying rows (col1=20) will be fetched, and not the entire view.
Now, there are times when that can't happen, normally when there are aggregates within the view.
Make sense?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 12, 2008 at 2:47 pm
Heh. You know, if you actually type "TPC Cursor" (with the quotes) into Google, the result is pretty funny.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 12, 2008 at 2:54 pm
rbarryyoung (8/12/2008)
Heh. You know, if you actually type "TPC Cursor" (with the quotes) into Google, the result is pretty funny.
That is pretty funny. (1 result to this thread)
August 12, 2008 at 2:55 pm
riix (8/12/2008)
Cuz I can even use cobol.net to create store procs?:D
Ew - I never thought I'd see cobol mentioned again. Of all the languages I've had to code in
(Ada, APL, Assembly, COBOL, L6, LISP, Fortran [IV, V, VII, 90], Pascal, PL1), COBOL was the most tedious.
August 12, 2008 at 3:14 pm
Grant Fritchey (8/12/2008)
I'm on the way home, I'll try to set up a test tomorrow.
I saved you the trouble.
http://sqlinthewild.co.za/index.php/2008/08/12/views-or-functions/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 12, 2008 at 4:08 pm
jpowers (8/12/2008)
riix (8/12/2008)
Cuz I can even use cobol.net to create store procs?:DEw - I never thought I'd see cobol mentioned again. Of all the languages I've had to code in
(Ada, APL, Assembly, COBOL, L6, LISP, Fortran [IV, V, VII, 90], Pascal, PL1), COBOL was the most tedious.
Sometimes I think COBOL should be a four-letter word. Yes, tedious, wordy, bloated. It is amazing how long a simple COBOL program can be when compared to the same program written in almost any other language.
BUT, it does a job, and does it well in many cases.
😎
August 12, 2008 at 4:29 pm
chrisleonard (8/12/2008)
Well, I'm not hand-waving; just type "TPC cursor" into Google and you will immediately find, as I did, a post by the wonderful Linchi Shea that announced to the world a while back that *all* of the TPC-E benchmarks on SQL Server up to that point had relied heavily on cursors. And then if you go further (as in, click one link) you will find an extended discussion between the likes of Adam Machanic and Greg Linwood. It is significant to see that two guys of this stature had the guts to admit in public that the topic is nuanced. It is also significant to understand commands such as Linwood's: "I agree that cursors are measurably slower than set based processing for simple statements such as the example you've provided, the game changes as the complexity increases."
Several points here, first Adam Machanic never says anything here that could be taken as meaning that he is affirming or validating what Linchi Shea claims. Quite the opposite, in fact, Adam is trying to get Linchi to explain how what he(?) said was possible, which Linchi never does.
Secondly, Linchi never provides a verifiable example. Sure, there is an example procedure, that's step one. But there are two more steps to providing verifiable examples: table definitions and data. We still don't have those, all we have is an off-hand statement that cursors were the best choice in a certain case, which we cannot confirm and which Adam expresses frank doubts about in his own Blog.
You are following the party line, but you are also simply hand-waving. We all know that there are cases, lots of them, where eliminating cursors is helpful. To claim that this proves, somehow, that cursors are always bad is simply a logical error: "if X is sometimes an improvement over Y, then Y is always wrong." That's the FUD-mongering that has become part of SQL groupthink, and to be honest it's embarrassing since database professionals are, first and foremost, logic professionals.
Please. Although I do not know you, you seem pretty smart to me and I am pretty sure that you know that sloganeering does not constitute logical or legitimate argument. "following the party line", "FUD-mongering" and "SQL groupthink" are all attempts to tar my arguments by association with negative connotations based on assumptions about both my thought processes and my affiliations which you do not know, are not in evidence here and are mostly incorrect.
My position is not the "party line" as what both Microsoft and most people practice (and then preach, in order to be self-consistent) is exactly what is stated herein: that no hard and fast rules should ever apply. However, while I am no fan of arbitrary rules, a few rules make darn good sense and "Don't Use Cursors" is one of them. Why? Well, here's the thought process that has led me to this over the last 10 years:
1. 99+% of all instances of Cursor usage are bad. Many are so bad that companies have to pay people like me a lot of money to remedy the problems that they cause.
2. Since SQL Server 2005, I have been unable to find a single instance where Cursor usage was necessary, despite numerous claims to the contrary. Also despite public challenges.
3. Since SQL Server 2005, I have also been unable to find a single instance where Cursors were the clear best solution. Also despite public requests and challenges.
4. Long experience teaching both customers and employees has taught me one thing over and over again: If I give them ANY indication that Cursors are in ANY sense acceptable, they will use them. See #1 above for results. The pull to revert from set-based to procedural-based programming is like the gravity well of a black hole. Once they see it as possible, they cannot resist resorting to it.
5. Although I can conceive of abstract and theoretical cases where a Cursor might be the best solution for a problem, they are all arbitrary and either purposeless or just plain stupid on a database (ex: implement a Hailstone function trace).
Conclusion: Until someone can demonstrate either a case with a genuine need for cursors or where cursors are clearly the best answer, they should not be used. Even if it is "as good as" some other solution and it was easier to write, because doing so legitamizes its use for all of those 99.9% instances where it is bad and causes real harm.
I do not think that that is either an unreasonable or an unfounded thought process: If it really is the best solution for some cases or necessary for some cases then Please: supply us with a verifiable example. Until then, saying that "Cursors are OK" just spreads harm throughout the SQL Server community.
I know you have made lots of great contributions here, and I fully respect the time and effort you've put into doing so.
Thanks, I really do appreciate that. But I also believe that who we are shouldn't really matter. I would rather that our ideas and reasoning stand on their own, with or without us.
However, I would humbly suggest that you may be assuming that a certain set of use cases are all there ever are, or that we can always drive business requirements back to a simplified set of SQL use cases, and that you may need to look beyond that limited set of use cases. Surely you don't believe that the best way to post required file IOs, send emails, trigger job executions, or do other intrinsically row-by-row operations that are not directly supported by T-SQL DML would always, in every case, preclude the use of cursors?
What I would suggest is that there are other ways to do those things and that instead of validating something that invariably gets mis-used and abused, we should use those other means. We should also put all the pressure on Microsoft that we can to provide set-based means to do those things.
My comment about concurrency and your comments about cursors using temp tables are actually in agreement in a way. I think we all know that cursors may use temp tables or worktables - no news there. But there are some business cases where we know that we need to do "something" to a whole bunch (millions, at least) of our "entities." In those cases, one of the first choices I have to make is do I (a) iterate over the production data, possibly inducing contention in the production table, or (b) iterate over a temporary data store of some sort. If I go for (b) I can have better control over concurrency issues without having to use NOLOCK hints, etc. So then it's a question of whether I (a) dump the data into a temp table and WHILE loop over it; (b) dump the data into a table and CURSOR over it; or (c) cursor (fast_forward, thanks) over the production table. You may say that "WHILE loops are the preferred solution" as you have before, but that's begging the original question, which was: are they always?
I think that my comments above cover WHILE vs Cursors. As for the locking issues, I really think that more sites should seriously consider using snapshot isolation. It is a much better solution for most of the cases like that.
As a sidebar, this reminds me of another debate that goes on sometimes (and another item on some people's "SQL Profanities" list) : using XML to pass in rowsets. In the past we have frequently used XML to pass in bundles of rows of data, and eaten the parsing cost. We have definitely proven repeatedly that sending in bundles of rows in one call is more efficient for us in our overall workload than calling onesy procs, even though that approach would entirely eliminate XML parsing of any kind. Still, there are people that say "that's always bad!" and I always wonder why. So there's another one for the discussion.
Well I am no fan of XML or its frequent abuses. However, given that we are stuck with it (and we are not going to get anything better) this seems to me exactly the kind of use that it was designed for and an excellent solution for a persistent problem in the Application-Database interfacing (that is how to make the Business-Entity-Transaction a single transactional call to the database, instead of a drunken-walk zig-zag of calls, states, branches and id keys).
And I'm not making up the legal bit. I've been prohibited from posting articles on places like SSC and SSWUG without vetting them through our legal department first, but I have been cleared to work on various projects for Microsoft. So, just because it's easier, I sanitize what I say here and stick with the Microsoft projects. We may well wind up seeing each other in Redmond some day, and we can still disagree there. 🙂
That's a shame, but I understand. I'll be at PASS this year if you want to buttonhole me there. 🙂
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 12, 2008 at 4:30 pm
Ken Simmons (8/12/2008)
That's pre-2005.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 13, 2008 at 6:11 am
riix (8/12/2008)
Grant you seem more stressed than Gail; the point of discussion was Gail earlier stmt:"Just watch the performance implications. A single statement tvf is usually fine, the multi-statement ones temd not to perform well if they are returning lots of rows and are part of other queries"
and this was in answer to my prior-prior that a tvf is preferable to using the combination of a view and a controlling sp to "dynamically filter" the view.
this was the context of the discussion.
Now you really feel running a view that returns zillions of huge rows into a stored procedure that then just selects a few from that lot (and then only just a few fields from these) - this is more effective than writing a tvf that returns just what is needed?
Yeah, I am a bit more exercised than Gail. Your statement that people are just, and I'm sure I'm not quoting correctly, spreading fear uncertainty and doubt on stuff that isn't true irked me. I'm sure you're a good programmer. No one has suggested that you're being less than honest. However, you've suggested that not only do we not know what we're doing, but that we're just making crap up. Absolutely, demonstrably false.
Below is a stupid simple example with very small data sets. Run it and check the times. I don't clear the cache or anything. Everything is allowed to run and can be run multiple times. In all my local tests on this tiny little data set, the multi-valued UDF ran twice as slow as the view or the single statement UDF. This is simple data, not FUD. Multi-valued UDF's are inherently slower and can lead to very serious performance problems if they have to deal with larger data sets, or, worse still, if they have to be used in joins.
Now, I'm not saying NEVER use them. I'm saying you need to be extremely cautious using them and, usually, a different approach will work better. FUD that.
Script:
CREATE TABLE dbo.Parent
(ParentId int identity(1,1)
,ParentDate datetime)
CREATE TABLE dbo.Child
(ChildId int identity(1,1)
,ParentId int
,ChildDate datetime)
DECLARE @i int
DECLARE @j-2 int
SET @i = 1
SET @j-2 = 1
WHILE @i < 100
BEGIN
INSERT INTO dbo.Parent
(ParentDate)
SELECT GETDATE()
WHILE @j-2 < 100
BEGIN
INSERT INTO dbo.Child
(ParentId
,ChildDate)
SELECT @i
,GETDATE()
END
SET @i = @i + 1
END
CREATE VIEW dbo.vJoin
AS
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
CREATE FUNCTION dbo.SingleUDF ()
RETURNS TABLE
AS
RETURN
(
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
)
CREATE Function dbo.MultiUDF ()
RETURNS @Multi TABLE
(ParentId int
,ParentDate datetime
,ChildId int
,ChildDate datetime)
AS
BEGIN
INSERT INTO @Multi
(ParentId
,ParentDate
,ChildId
,ChildDate)
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
RETURN
END
set statistics time on
select * from vJoin
select * from SingleUDF()
select * from MultiUDF()
set statistics time off
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 13, 2008 at 6:22 am
GilaMonster (8/12/2008)
Grant Fritchey (8/12/2008)
I'm on the way home, I'll try to set up a test tomorrow.I saved you the trouble.
http://sqlinthewild.co.za/index.php/2008/08/12/views-or-functions/
Too late. I ran up a silly example and posted it above.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 15 posts - 46 through 60 (of 65 total)
You must be logged in to reply to this topic. Login to reply