June 13, 2008 at 2:24 pm
Outstanding, Kevin! Thanks for posting the code.
Yep, Matt was right... One test is worth a thousand words... let the "test off" begin. To make it easier for everyone to test the same thing, I'll rewrite your good examples and use them against some code that will generate a million rows in just about a minute... that way, everyone can play.
I'll also throw in some set-based code to show that you should NEVER settle for a cursor or a While loop... it's too damned easy to do it set based.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 13, 2008 at 2:37 pm
Wasn't sure if you wanted the indexes at all. Here they are
CREATE CLUSTERED INDEX [SITE_C_1] ON [dbo].[SITE]([Site_Addr_St_Name], [Site_Addr_St_Num], [Site_Addr_St_Suff], [Site_Addr_St_Post_Dir]) ON [FILEGROUP4]
GO
CREATE UNIQUE INDEX [SITE_UPKNC] ON [dbo].[SITE]([Site_Id])
GO
CREATE INDEX [SITE_NC_2] ON [dbo].[SITE]([Zoning_Dsg], [Site_Id])
GO
CREATE INDEX [SITE_NC_3] ON [dbo].[SITE]([Site_Id], [Zoning_Dsg])
GO
Haven't read enough on here to be absolutely sure what you are referring to in terms of sets (As you can see I am fairly new to posting here. I probably know what they are just not registering in my brain right now) .
June 13, 2008 at 2:46 pm
Well, welcome aboard, Kevin! "Set-based" is nothing more than processing sets of information instead of processing "Row By Agonizing Row" (other wise known as "RBAR"... see my tag line below). Of course, there is some code that looks set based (no declared loop of any kind) that can actually be worse than a cursor. I refer to that stuff as hidden RBAR. Take a look at the following to see what I mean...
http://www.sqlservercentral.com/articles/T-SQL/61539/
Anyway, you spirit is in the right place with you posing code and timings and all. I'll be back with some code of my own...
--Jeff Moden
Change is inevitable... Change for the better is not.
June 13, 2008 at 3:00 pm
I apologize for jumping into the middle of this thread, but I must be missing something. What's wrong with
"Select sum(SF) as TheAnswer from Site"?
Or is the whole point to benchmark a cursor loop against a while loop?
One note about benchmarking: Even if you have a dedicated server machine, be careful about comparing benchmark results based on differences in GetDate() values (subtracting start time from finish time). I call that "wall clock" benchmarking. It's OK for measuring big changes, but not super accurate if you're splitting hairs and totally innaccurate if anything else is going on on the server.
In my opinion, it is much more accurate to start a clean login (new spid) for each test case, run the tests, and then use the SSMS's activity monitor or EM's process info to compare counts for cpu cycles, I/O, and perhaps memory used for each test.
Andy
June 13, 2008 at 3:06 pm
Oops, I meant
"Select sum(site_frontage) as TheAnswer from Site"
June 13, 2008 at 3:21 pm
Kevin Rathgeber (6/13/2008)
...(I know I could use sum(), I was just running a test on a large dataset)....
OK, never mind. Looks like I might have been preaching to the choir! :w00t:
June 13, 2008 at 3:24 pm
Oh definitely, the far preferred method would be to use the Select sum(fieldname) from table. The point of this whole thing is far greater than that.
The code above is just a very basic example of something you "could" do in a loop or cursor, not necessarily something you should do . You can remove the line of code where I am summing the frontage value as that is just a place holder for some other piece of code you would typically run in a loop or cursor.
Edit: I see you beat me to it. I too kind of jumped into the middle of this.
June 13, 2008 at 3:45 pm
Sure, I just can't think of any code I'd run in a loop, cursor or otherwise! 😀
I'm serious about my comment on "Wall Clock" benchmarking, though. I don't know how many times a novice developer has asked me if an index was missing because their RBAR-from-hell code took 10 times longer to run today than yesterday. I had to implement a home-grown cpu monitoring utility on our development server so I could show them that it took 10 times longer today because the guy sitting in the cube next to them was running their own RBAR-from-hell code at the same time.
Of course I then sit them down and show them how the same thing can be accomplished 100 times faster than yesterday by using a set-based approach. Then they use that approach forever after. (I also believe in the Easter Bunny.) 😉
June 13, 2008 at 4:10 pm
Thanks Jeff. That was a good overview on the RBAR concept. Lucky for me I have tried to avoid that as much as possible. To tell you the truth it just seemed logical.
My favorite is when other developers I work with get a glazed over look when I mention a cartesian product to them.
Have I avoided it in every case.....god no. I'm only human. I know there is some code out there where I have RBAR implemented.
June 13, 2008 at 5:44 pm
I wanted to get some clearer results from what I had before. I am now on SQL 2005 Express. No one on the server but me. I am running each test 20 times and taking the average.
I also removed the sum function so all that is left is the skeleton code for the cursor vs. the while loop.
I ran the scenario several time over and the while loop always won out if I specified the entire recordset, but if I specified Top X then the Cursor always won out. The Top X was bugging me so I figured lets do this real world where you normally don't do a Top X. So I started specifying ranges of IDs using a between clause of differing sizes
Here are my results against the same recordset I used earlier in the day (65528 rows)
4966 Records
Cursor: 211.35 ms
While: 175.25 ms
9926 Records
Cursor: 394.85 ms
While: 326.15 ms
19892 Records
Cursor: 829.85 ms
While: 694.37 ms
33527 Records
Cursor: 1291.75 ms
While: 1182.4 ms
65528 (All) Records
Cursor: 2499.80 ms
While: 2342.05 ms
So using more real world scenarios, the while always seems to win out though not by much.
Jeff, does this seem right to you?
June 13, 2008 at 6:14 pm
Ok... lemme demonstrate a couple of things... first, we need some data that everyone
can use and we need lot's of it because these tests run so darned fast. Instead of
using TOP in all the code examples, change the TOP in the following data generator
code to match the size of the test you want... yep, I've included some extra columns
because this is really a copy of my standard data generator and I thought I'd share it,
as well... usually takes something less than 90 seconds to build a million rows. Details
are where they belong... in comments in the code! 😉
drop table jbmtest
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 100,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000 --<<< LOOK! Change this number to change the size of the test
SomeID = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN
--===== A table is not properly formed unless a Primary Key has been assigned
-- Takes about 1 second to execute.
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (SomeID)
In case you didn't notice, we just eliminated a cursor or a while loop by using a
constrained cross-join! Try building the same thing using a WHILE LOOP and see how
long building a million rows of data takes! 😉
Ok... just like before, we're just going to do a sum on a column. Yes, SUM() would be
the way to do it but, just like before, we trying to give the code something very simple
to do. Here's the cursor code... in the spirit of all that is "holy" in SQL, all fat has been
trimmed out of the code... no extra columns are passed in.
--===== Common
DECLARE @TimeStart DATETIME
--===== Cursor
SET @TimeStart = GETDATE()
DECLARE @SomeMoney MONEY,
@SumSomeMoney MONEY
DECLARE curTest CURSOR
FOR
SELECT TOP 100000
SomeMoney
FROM dbo.JbmTest
OPEN curTest
FETCH NEXT FROM curTest
INTO @SomeMoney
SELECT @SumSomeMoney = 0
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @SumSomeMoney = @SumSomeMoney + ISNULL(@SomeMoney,0)
FETCH NEXT FROM curTest
INTO @SomeMoney
END
CLOSE curTest
DEALLOCATE curTest
SELECT @SumSomeMoney AS Total,
CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration
If you run that over and over for a million rows, you'll get an average close to this
(total will vary because of the random nature of the numbers)...
[font="Courier New"]Total Duration
--------------------- --------------------------
49955839.1400 00:00:36:267[/font]
Let's do the same thing with a Temp table and While loop following the same rules...
--===== Common
DECLARE @TimeStart DATETIME
SET NOCOUNT ON
--===== While Loop
SET @TimeStart = GETDATE()
DECLARE @SumSomeMoney MONEY,
@MyCount INT,
@Counter INT
SELECT IDENTITY(INT,1,1) AS RowNum,
SomeMoney
INTO #MyHead
FROM dbo.JbmTest
SELECT @MyCount = @@ROWCOUNT
ALTER TABLE #MyHead
ADD PRIMARY KEY CLUSTERED (RowNUM) WITH FILLFACTOR = 100
SELECT @SumSomeMoney = 0,
@Counter = 1
WHILE @Counter <= @MyCount
BEGIN
SELECT @SumSomeMoney = @SumSomeMoney + ISNULL(SomeMoney,0)
FROM #MyHead
WHERE RowNum = @Counter
SELECT @Counter = @Counter + 1
END
DROP TABLE #MyHead
SELECT @SumSomeMoney AS Total,
CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration
[font="Courier New"]Total Duration
--------------------- --------------------------
49955839.1400 00:00:30:453[/font]
Oh my goodness! The Temp Table and While Loop method ran a good 6 seconds
faster! (Don't even think of running it without that primary key unless you also like
watching grass grow...). So THAT's why everyone says a WHILE LOOP is faster that a
CURSOR...
Well, it's just not true folks. Run the following cursor code... the ONLY thing I've
changed is in RED...
--===== Common
DECLARE @TimeStart DATETIME
SET NOCOUNT ON
--===== Cursor
SET @TimeStart = GETDATE()
DECLARE @SomeMoney MONEY,
@SumSomeMoney MONEY
DECLARE curTest CURSOR FAST_FORWARD --"Firehose" cursor
FOR
SELECT SomeMoney
FROM dbo.JbmTest
OPEN curTest
FETCH NEXT FROM curTest
INTO @SomeMoney
SELECT @SumSomeMoney = 0
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @SumSomeMoney = @SumSomeMoney + ISNULL(@SomeMoney,0)
FETCH NEXT FROM curTest
INTO @SomeMoney
END
CLOSE curTest
DEALLOCATE curTest
SELECT @SumSomeMoney AS Total,
CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration
[font="Courier New"]Total Duration
--------------------- --------------------------
49955839.1400 00:00:31:810[/font]
Well, wadaya know... almost as fast as the While loop. And a lot less effort.
Here's the timings I got for all 3 methods across a range of row counts... it was done
on a single user server with it's own hard disks... nothing is shared...
[font="Courier New"]Rows Cursor While Loop FH Cursor
10 0.000 0 - 0.017 0.000
100 0.000 0.017 0.000
1000 0.033 0.063 0.033
10000 0.390 0.343 0.297
100000 3.563 3.327 3.203
1000000 36.267 30.453 31.810[/font]
Notice that both cursor types are faster up to the 1000 mark and that the Fire hose
cursor is faster than the While Loop all the way up past 100,000 rows...
In other words, from these simple tests which measure really base performance of the
3 methods, converting Cursors into While Loops is a bit like attending a charity orgy to
support virginity.
Oh, yeah... let's do the set based equivalent and see what happens...
[font="Courier New"]Rows Cursor While Loop FH Cursor Set Based
10 0.000 0 - 0.017 0.000 0.000
100 0.000 0.017 0.000 0.000
1000 0.033 0.063 0.033 0.000
10000 0.390 0.343 0.297 0.000
100000 3.563 3.327 3.203 0.063
1000000 36.267 30.453 31.810 0.673[/font]
I'm thinking that we're pretty much destroyed the myths the either the cursor or the
temp table with a while loop is faster than the other. They both suck almost equally
bad especially when the number of rows starts to get up there a bit.
Oh, yeah... what's the set based solution that I used? Why, the basis of all intelligent
SQL programming... we don't normally use a variable as the target of the code, but here it is...
--===== Set Based "Loop"
DECLARE @TimeStart DATETIME
SET @TimeStart = GETDATE()
DECLARE @SomeMoney MONEY,
@SumSomeMoney MONEY
SELECT @SumSomeMoney = 0
SELECT @SumSomeMoney = @SumSomeMoney + ISNULL(SomeMoney,0)
FROM dbo.jbmTest
SELECT @SumSomeMoney AS Total,
CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration
Between the "Set Based 'Loop'", the constrained Cross Join, and the Tally table, you
can get around the need for just about any form of RBAR you can think of. Even a
simple UPDATE can be used to take the place of many forms of RBAR. Take the
necessarily procedural code for doing a running total... if you think that the only way
to do one is with a cursor or while loop or you've seen the supposedly set based
but performance challenged methods that use a triangular join, then you haven't seen
the following article...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
Now THAT's how you eliminate a cursor! 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
June 13, 2008 at 6:49 pm
Kevin Rathgeber (6/13/2008)
So using more real world scenarios, the while always seems to win out though not by much.Jeff, does this seem right to you?
I'd pretty much agree depending on the machine you're using. As you just saw in the testing I did, they both take turns winning and one never wins over the other by very much.
There are rare exceptions, but, for the most part, RBAR is a virtual guarantee of poor performance.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 13, 2008 at 7:19 pm
Thanks Jeff for that excellent in depth explanation.
As I explained to you privately my own personal brain fart made me miss an important piece of working with sets. Understanding that blunder, now when I think of it, it is a no brainer.
Thanks again.
June 13, 2008 at 7:41 pm
Thanks for the feedback, Kevin... and thanks for the code you posted. There are only a few people who will take the time to actually do performance tests never mind posting the results and the code they used. Very happy to have another one of the "good guys" join the forum!
--Jeff Moden
Change is inevitable... Change for the better is not.
June 14, 2008 at 7:31 am
You guys REALLY need to listen up when TheSQLGuru speaks!! 😀
1) The first post I did on this thread way back requested someone test the result with the fastest cursor definition possible: DECLARE curTest CURSOR FORWARD_ONLY READ_ONLY STATIC LOCAL. No one did that, despite Jeff running quite the suite of tests using FAST_FORWARD - which is now proven to NOT be the fastest cursor method available. See here: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/21/curious-cursor-optimization-options.aspx.
On my laptop the above declaration for the cursor came in at 14.6 seconds, the fast_forward cursor was 17.0 sec avg with the temp table/while loop construct taking 14.8 sec avg. So my cursor still beats the while loop, even at 1M rows.
2) it is just plain silly in this case to put the data into a temp table to iterate the while loop over. Just use SomeID as the driver value. This code ran in 10.9 sec avg, or almost 1/3 faster:
DECLARE @TimeStart DATETIME
SET NOCOUNT ON
--===== While Loop
SET @TimeStart = GETDATE()
DECLARE @SumSomeMoney MONEY,
@MyCount INT,
@Counter INT
SELECT @MyCount = count(*) from dbo.JbmTest
SELECT @SumSomeMoney = 0,
@Counter = 1
WHILE @Counter <= @MyCount
BEGIN
SELECT @SumSomeMoney = @SumSomeMoney + ISNULL(SomeMoney,0)
FROM dbo.JbmTest
WHERE SomeID = @Counter
SELECT @Counter = @Counter + 1
END
SELECT @SumSomeMoney AS Total,
CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration
3) I also tried a TOP 1 implementation which I got to run in 11.5 sec avg - still a good bit faster than the cursors but slower than direct access. Note that I didn't debug this just slapped it together so not sure if it works right. Gotta run take care of baby!
DECLARE @TimeStart DATETIME
SET NOCOUNT ON
--===== While Loop
SET @TimeStart = GETDATE()
DECLARE @SumSomeMoney MONEY, @currentmoney money,
@currentrow INT
SELECT @SumSomeMoney = 0
SELECT top 1 @currentrow = someid,
@currentmoney = somemoney
from dbo.JbmTest
order by someid
WHILE @@ROWCOUNT > 0
BEGIN
SET @SumSomeMoney = @SumSomeMoney + ISNULL(@currentmoney,0)
SELECT top 1 @currentrow = someid,
@currentmoney = somemoney
from dbo.JbmTest
WHERE SomeID > @currentrow
order by someid
END
SELECT @SumSomeMoney AS Total,
CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration
4) And if you REALLY want row-by-row stuff to be fastest, you need to step outside TSQL altogether. CLR RBAR blows the doors off of native cursors. http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/28/poor-men-see-sharp-ndash-more-cursor-optimization.aspx
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 15 posts - 166 through 180 (of 296 total)
You must be logged in to reply to this topic. Login to reply