December 24, 2008 at 12:57 pm
Jeff
You know what, I have to say - would it kill you to agree with what people have said rather than repeat what other people say and try to pass it off as wisdom?
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
December 24, 2008 at 2:55 pm
Just for fun if you had a banking application with millions of account records in a table and had to calculate some complex interest every day on the balance based upon a account holder billing date which would be better, cursor or set based operation?
in both cases you have 3 columns you need to check & update , balance, date and interest
update set balance = CalcNewBal(balance,interestRate) where billingDate = today
or
declare int cursor for select * from acounts where billingDate = today
open int
fetch ....
while
begin
update set balance = CalNewBal(balance,interestrate) where current of int
fetch
end
close
If you add just a little bit of bussiness compelxity with the column, it becomes almost impossible to code in a update statement in a set based operation.
I don't know which of the above will be faster - i would guess the set based, but maybe you be locking out the db from customer withdraws longer or ... as an earlier post said - it just depends upon your application
December 24, 2008 at 3:01 pm
The other restrictive issue is operations such as BCP - which I use a lot as I operate in a widely distributed environment where trusted server setup has not been introduced (vendor who owns the software is a bit inexperienced when it comes to SQL Server). Easiest way to retrieve data is therefore to use a cursor to BCP data out of the target systems before BCPing it back into my local system. And you can't use BCP against a local temp table. I'd have to create and destroy tables with my list data as required - this is also a solution, but not great for auditing and security purposes.
I try to avoid cursors whenever I can, but especially for FAST_FORWARD sometimes they're just too darn convenient. A guilty pleasure like chocolate 🙂
Have a safe festive season everyone!
December 24, 2008 at 3:02 pm
Haroon Said (12/24/2008)
Just for fun if you had a banking application with millions of account records in a table and had to calculate some complex interest every day on the balance based upon a account holder billing date which would be better, cursor or set based operation?in both cases you have 3 columns you need to check & update , balance, date and interest
update set balance = CalcNewBal(balance,interestRate) where billingDate = today
or
declare int cursor for select * from acounts where billingDate = today
open int
fetch ....
while
begin
update set balance = CalNewBal(balance,interestrate) where current of int
fetch
end
close
If you add just a little bit of bussiness compelxity with the column, it becomes almost impossible to code in a update statement in a set based operation.
I don't know which of the above will be faster - i would guess the set based, but maybe you be locking out the db from customer withdraws longer or ... as an earlier post said - it just depends upon your application
In the situation where you can write one update to do the job in a statement and there are millions of records it would probably be orders of magnitude faster as a single update statement as opposed to a cursor.
December 24, 2008 at 4:33 pm
Matt Whitfield (12/24/2008)
@jeff -I tried to test the COALESCE method, but it was horrifically slow over about 10,000 rows. Also interested why you say SELECT COALESCE(@SQL .... when it's actually a lot quicker to just set @sql to '' first?
Absolutely... I agree... COALESCE is the worst (more on that in a minute). As to why I used it... habit... a really bad one, left over from the days of SQL Server 2000. Pseudo-cursor methods similar to the following are much, much faster...
DECLARE @sql VARCHAR(MAX)
SELECT @sql = (SELECT 'DBCC CheckDB(' + QUOTENAME(Name) + ')' + CHAR(10)
FROM Master.sys.Databases
WHERE Database_ID > 4
FOR XML PATH(''))
PRINT @sql
Of course, that won't work in SQL Server 2000 and most people revert back to the COALESCE Method or the straight method you suggested. In that case, ISNULL is a better way to go than COALESCE because it's faster and is almost as fast as the straight method.... if gives the added pleasure of driving the ANSI ONLY folks absolutely bonkers. 😛 That's why I normally post the COALESCE solution... keeps me from having to explain over and over that I don't care about ANSI compliant SQL because, until it catches up with reality and everyone follows it to a "T", code portability is just a myth.
Here's a bit of test code I use to create test tables for this type of stuff...
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
... and here're some performance tests that use that code to demonstrate what Matt is talking about...
PRINT '===== Matt''s cited method ====='
DECLARE @SomeString VARCHAR(MAX)
SET @SomeString = ''
SET STATISTICS TIME ON
SELECT @SomeString = @SomeString + SomeCsv + CHAR(10)
FROM dbo.JBMTest
WHERE RowNum <= 10000
SET STATISTICS TIME OFF
PRINT STR(LEN(@SomeString)) + ' Characters'
PRINT REPLICATE ('=',78)
GO
-----------------------------------------------------------------------------
PRINT '===== Coalesce method ====='
DECLARE @SomeString VARCHAR(MAX)
SET STATISTICS TIME ON
SELECT @SomeString = COALESCE(@SomeString + CHAR(10),'') + SomeCsv
FROM dbo.JBMTest
WHERE RowNum <= 10000
SET STATISTICS TIME OFF
PRINT STR(LEN(@SomeString)) + ' Characters'
PRINT REPLICATE ('=',78)
GO
-----------------------------------------------------------------------------
PRINT '===== IsNull method ====='
DECLARE @SomeString VARCHAR(MAX)
SET STATISTICS TIME ON
SELECT @SomeString = ISNULL(@SomeString + CHAR(10),'') + SomeCsv
FROM dbo.JBMTest
WHERE RowNum <= 10000
SET STATISTICS TIME OFF
PRINT STR(LEN(@SomeString)) + ' Characters'
PRINT REPLICATE ('=',78)
GO
-----------------------------------------------------------------------------
PRINT '===== XML path method ====='
DECLARE @SomeString VARCHAR(MAX)
SET STATISTICS TIME ON
SELECT @SomeString = (SELECT SomeCsv + CHAR(10) FROM dbo.JBMTest WHERE RowNum <= 10000 FOR XML PATH(''))
SET STATISTICS TIME OFF
PRINT STR(LEN(@SomeString)) + ' Characters'
PRINT REPLICATE ('=',78)
GO
-----------------------------------------------------------------------------
Here're what I get for results on my 6 year old, single 1.8 Ghz CPU, 1GB ram desktop box...
[font="Courier New"]===== Matt's cited method =====
SQL Server Execution Times:
CPU time = 41406 ms, elapsed time = 69796 ms.
700000 Characters
==============================================================================
===== Coalesce method =====
SQL Server Execution Times:
CPU time = 89078 ms, elapsed time = 106410 ms.
699999 Characters
==============================================================================
===== IsNull method =====
SQL Server Execution Times:
CPU time = 42531 ms, elapsed time = 68476 ms.
699999 Characters
==============================================================================
===== XML path method =====
SQL Server Execution Times:
CPU time = 62 ms, elapsed time = 794 ms.
700000 Characters
==============================================================================
[/font]
The XML Path method for repetative concatenation is just nasty fast...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 24, 2008 at 4:38 pm
Matt Whitfield (12/24/2008)
JeffYou know what, I have to say - would it kill you to agree with what people have said rather than repeat what other people say and try to pass it off as wisdom?
Matt... I did agree and said so. There's also a bit to be added in retrospect. That's what discussions are about.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 24, 2008 at 5:09 pm
Matt's comment notwithstanding, I still think Jeff has a point. If he were preaching to the converted, it would be a tired old point, but far too many developers still default to RBAR thinking instead of comparing set-based vs procedural solutions based on a thorough understanding of both.
Having said that, an interesting conversation pattern is running as a background thread here:
- Tool A (e.g. cursors) gets bashed
- Someone reveals the secret passphrase (e.g. FAST_FORWARD) to make tool A "perform properly"
- This gets bashed, and the cycle repeats
It kind of reminds me of the mythical "go-faster" SQL Server startup parameter; its use by "experts" would make it simultaneously a spell and a shibboleth. Face it, much default behaviour in SQL Server is default for a good reason, and when you feel the urge to override things, there should be an accompanying urge to research just what the heck you're about to do. There's a good chance you'll be making a good choice, but if you can't even enumerate the other options, let alone analyse them, how sure can you be?
But back to the Cursor (tm) vs Gray-market Cursor vs Set-based argument. The million-row use case seems to be the interesting one here, since it contains the biggest potential to bring a system to its knees if the wrong decision is made. Here are a few thoughts...
At some data set size, you'll have to worry about what you are storing temporarily, and where. The "where" could be the server memory and/or tempdb, memory and/or disk space on the middle tier etc. Yes, the problem can be shuffled between machines, but the carpet remains lumpy. What one would presumably like to do is to minimise the amount of data persisted, for example by storing only a set of retrieved keys instead of detailed data, and then retrieving only one row of detailed data at a time for further processing. This persisted data could take the shape of a keyset cursor, a hand-rolled keyset on the server (table variable or temp table) or a dataset elsewhere.
Alternatively, you could hand things over to SQL Server and let it loose on a set-based solution. So will this be better? As usual, it depends, specifically on your database and query design, and on things like the concurrency situation at the time you're running it. Even if all the data retrieval for a million-row UPDATE could theoretically be done blindingly fast, you are statistically pretty likely to hit a blocking scenario if this is a main table in an OLTP environment during business hours, making a carpet-bombing update a bad choice in this scenario (yes, the old carpet again). Of course, row-by-row processing within a single transaction will land you with the exact same problem, exacerbated by being done more slowly.
But assuming that a set-based update is feasible, how do you compare apples and apples? As (I believe) Jeff has eloquently pointed out elsewhere, the internals of that query may still involve some form of row-by-row processing, and a large amount of data may still need to be cached temporarily during query processing. However, an ideal database and query design can sidestep that by avoiding memory-hungry and blocking operators. And guess what? A nice clean loop-join query plan used in a set-based solution will still be just as nice and clean when supporting a dynamic cursor!
I'm not trying to muddy the water here, just pointing out that there may be times when the cursor vs set-based argument could be a case of fiddling while Rome burns, inasmuch as both solutions may end up having their costs dominated by the same things. Yes, the cursor will ALWAYS do deeper stack traversals, but that is about the only constant.
December 25, 2008 at 4:32 am
Excellent article ...:)
December 25, 2008 at 10:58 am
Hi Gaby,
Thanks for the Article. It is highly helpful for our perspect. Atleast this query will be of great help for those SPs/UDFs that are badly hit by excessive usage of the cursor. In this context I have just one more addition to your query. Rather than deleting the data from the Table variable can it be possible to select the database by the virtue of its identity value assigned to it as I have mentioned in the following query :
declare @query varchar(100), @count int, @counter int, @dbname sysname
declare @dblist table (dbid int IDENTITY, dbname sysname)
insert into @dblist
select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')
select @count = count(*) from @dblist
select @counter = 1
while @counter <= @count
begin
select @dbname = dbname from @dblist where @counter = dbid
select @query = 'dbcc checkdb(' + quotename(@dbname) + ')'
exec(@query)
Set @counter = @counter + 1
end
go
December 26, 2008 at 8:02 am
Arindam Ganguly (12/25/2008)
Hi Gaby,Thanks for the Article. It is highly helpful for our perspect. Atleast this query will be of great help for those SPs/UDFs that are badly hit by excessive usage of the cursor. In this context I have just one more addition to your query. Rather than deleting the data from the Table variable can it be possible to select the database by the virtue of its identity value assigned to it as I have mentioned in the following query :
declare @query varchar(100), @count int, @counter int, @dbname sysname
declare @dblist table (dbid int IDENTITY, dbname sysname)
insert into @dblist
select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')
select @count = count(*) from @dblist
select @counter = 1
while @counter <= @count
begin
select @dbname = dbname from @dblist where @counter = dbid
select @query = 'dbcc checkdb(' + quotename(@dbname) + ')'
exec(@query)
Set @counter = @counter + 1
end
go
Minor point, this section:
select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')
select @count = count(*) from @dblist
select @counter = 1
is more efficiently handled thusly:
select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')
select @count = @@rowcount, @counter = 1
that gets rid of an unnecessary table scan and combines 2 selects into one
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 26, 2008 at 2:44 pm
Jeff Moden (12/24/2008)
Matt... I did agree and said so. There's also a bit to be added in retrospect. That's what discussions are about.
Fair enough. I re-read it today and read it completely differently. That's working long hours for you 🙂
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
December 26, 2008 at 4:11 pm
TheSQLGuru (12/26/2008)
Arindam Ganguly (12/25/2008)
Hi Gaby,Thanks for the Article. It is highly helpful for our perspect. Atleast this query will be of great help for those SPs/UDFs that are badly hit by excessive usage of the cursor. In this context I have just one more addition to your query. Rather than deleting the data from the Table variable can it be possible to select the database by the virtue of its identity value assigned to it as I have mentioned in the following query :
declare @query varchar(100), @count int, @counter int, @dbname sysname
declare @dblist table (dbid int IDENTITY, dbname sysname)
insert into @dblist
select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')
select @count = count(*) from @dblist
select @counter = 1
while @counter <= @count
begin
select @dbname = dbname from @dblist where @counter = dbid
select @query = 'dbcc checkdb(' + quotename(@dbname) + ')'
exec(@query)
Set @counter = @counter + 1
end
go
Minor point, this section:
select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')
select @count = count(*) from @dblist
select @counter = 1
is more efficiently handled thusly:
select name from sys.databaseswhere name not in ('master', 'model', 'msdb', 'tempdb')
select @count = @@rowcount, @counter = 1
that gets rid of an unnecessary table scan and combines 2 selects into one
Hi Kevin,
1. I think whichever loop you use you will not notice any difference, because dbcc will run most of the time.
2. I you really need a fast loop, have a look at C# - it is likely to run faster.
December 26, 2008 at 4:57 pm
Matt Whitfield (12/26/2008)
Jeff Moden (12/24/2008)
Matt... I did agree and said so. There's also a bit to be added in retrospect. That's what discussions are about.
Fair enough. I re-read it today and read it completely differently. That's working long hours for you 🙂
Well, heh... better that you took out a little frustration on me than somebody at work. 🙂 Thanks for the come back.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 26, 2008 at 6:46 pm
****
Hi Kevin,
1. I think whichever loop you use you will not notice any difference, because dbcc will run most of the time.
2. I you really need a fast loop, have a look at C# - it is likely to run faster.
****
My correction had nothing what-soever to do with loops or runtime. It was simply correcting poor code. Unless such points are made people cannot improve their ability to write good code. And the fixes are trivial in nature and thus there is no excuse for them once someone has been taught to avoid them - not even the oft mentioned "I don't have time to write good code because it takes too long". 😎
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 29, 2008 at 6:05 am
Gaby A. (12/24/2008)
dford (12/24/2008)
Lot of good responses here. I have to say, I don't like seeing lopsided articles about getting rid of cursors. They exist still partially because there are good uses for cursors. There are so many of these types of "Cursors are Evil" articles that SQL noobs see these and kill millions of brain cells trying to avoid using cursors. I do think cursors are over used a bit but I would challenge anyone to accomplish some of the beastly data migrations that I have had to do without a cursor. A balanced article would be more appropriate. Not everyone that reads these articles has years of SQL experience. This site shows up well on search results and lots of new SQL developers and DBA's show up here for answers. My rant on onesided articles is done now.:DMy apologies if it seemed lopsided. So many scripts and articles on this and other web sites are ubiquitously populated with cursors, I thought about giving an opposite approach.
A couple things I've learned:
1. First time article, I'll be prepared next time with more thorough examples.
2. People get the most passionate about those things you expect to elicit the least amount of emotion.
Have a good holiday folks, and a happy new year.
Don't let the ******** get you down. It was a good first article. Do run the spell check on the next one though.
I think one of the best bits of feedback you've received in these responses was Jeff's to be very wary of using the estimated cost in execution plans as a measure of performance. It's especially iffy when comparing two completely different execution plans, one to another. You have to include reads, CPU and actual execution time for a complete picture.
Keep the articles coming.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 15 posts - 46 through 60 (of 272 total)
You must be logged in to reply to this topic. Login to reply