Eliminating Cursors

Question

Eliminating Cursors

Viewing 15 posts - 166 through 180 (of 296 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004421 More actions · Answer 1

Outstanding, Kevin! Thanks for posting the code.

Yep, Matt was right... One test is worth a thousand words... let the "test off" begin. To make it easier for everyone to test the same thing, I'll rewrite your good examples and use them against some code that will generate a million rows in just about a minute... that way, everyone can play.

I'll also throw in some set-based code to show that you should NEVER settle for a cursor or a While loop... it's too damned easy to do it set based.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Kevin Rathgeber SSC Veteran Points: 270 More actions · Answer 2

Wasn't sure if you wanted the indexes at all. Here they are

CREATE CLUSTERED INDEX [SITE_C_1] ON [dbo].[SITE]([Site_Addr_St_Name], [Site_Addr_St_Num], [Site_Addr_St_Suff], [Site_Addr_St_Post_Dir]) ON [FILEGROUP4]

GO

CREATE UNIQUE INDEX [SITE_UPKNC] ON [dbo].[SITE]([Site_Id])

GO

CREATE INDEX [SITE_NC_2] ON [dbo].[SITE]([Zoning_Dsg], [Site_Id])

GO

CREATE INDEX [SITE_NC_3] ON [dbo].[SITE]([Site_Id], [Zoning_Dsg])

GO

Haven't read enough on here to be absolutely sure what you are referring to in terms of sets (As you can see I am fairly new to posting here. I probably know what they are just not registering in my brain right now) .

Jeff Moden SSC Guru Points: 1004421 More actions · Answer 3

Well, welcome aboard, Kevin! "Set-based" is nothing more than processing sets of information instead of processing "Row By Agonizing Row" (other wise known as "RBAR"... see my tag line below). Of course, there is some code that looks set based (no declared loop of any kind) that can actually be worse than a cursor. I refer to that stuff as hidden RBAR. Take a look at the following to see what I mean...

http://www.sqlservercentral.com/articles/T-SQL/61539/

Anyway, you spirit is in the right place with you posing code and timings and all. I'll be back with some code of my own...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Andy DBA SSCommitted Points: 1776 More actions · Answer 4

I apologize for jumping into the middle of this thread, but I must be missing something. What's wrong with

"Select sum(SF) as TheAnswer from Site"?

Or is the whole point to benchmark a cursor loop against a while loop?

One note about benchmarking: Even if you have a dedicated server machine, be careful about comparing benchmark results based on differences in GetDate() values (subtracting start time from finish time). I call that "wall clock" benchmarking. It's OK for measuring big changes, but not super accurate if you're splitting hairs and totally innaccurate if anything else is going on on the server.

In my opinion, it is much more accurate to start a clean login (new spid) for each test case, run the tests, and then use the SSMS's activity monitor or EM's process info to compare counts for cpu cycles, I/O, and perhaps memory used for each test.

Andy

Andy DBA SSCommitted Points: 1776 More actions · Answer 5

Oops, I meant

"Select sum(site_frontage) as TheAnswer from Site"

Andy DBA SSCommitted Points: 1776 More actions · Answer 6

Kevin Rathgeber (6/13/2008)
...(I know I could use sum(), I was just running a test on a large dataset)....

OK, never mind. Looks like I might have been preaching to the choir! :w00t:

Kevin Rathgeber SSC Veteran Points: 270 More actions · Answer 7

Oh definitely, the far preferred method would be to use the Select sum(fieldname) from table. The point of this whole thing is far greater than that.

The code above is just a very basic example of something you "could" do in a loop or cursor, not necessarily something you should do . You can remove the line of code where I am summing the frontage value as that is just a place holder for some other piece of code you would typically run in a loop or cursor.

Edit: I see you beat me to it. I too kind of jumped into the middle of this.

Andy DBA SSCommitted Points: 1776 More actions · Answer 8

Sure, I just can't think of any code I'd run in a loop, cursor or otherwise! 😀

I'm serious about my comment on "Wall Clock" benchmarking, though. I don't know how many times a novice developer has asked me if an index was missing because their RBAR-from-hell code took 10 times longer to run today than yesterday. I had to implement a home-grown cpu monitoring utility on our development server so I could show them that it took 10 times longer today because the guy sitting in the cube next to them was running their own RBAR-from-hell code at the same time.

Of course I then sit them down and show them how the same thing can be accomplished 100 times faster than yesterday by using a set-based approach. Then they use that approach forever after. (I also believe in the Easter Bunny.) 😉

Kevin Rathgeber SSC Veteran Points: 270 More actions · Answer 9

Thanks Jeff. That was a good overview on the RBAR concept. Lucky for me I have tried to avoid that as much as possible. To tell you the truth it just seemed logical.

My favorite is when other developers I work with get a glazed over look when I mention a cartesian product to them.

Have I avoided it in every case.....god no. I'm only human. I know there is some code out there where I have RBAR implemented.

Kevin Rathgeber SSC Veteran Points: 270 More actions · Answer 10

I wanted to get some clearer results from what I had before. I am now on SQL 2005 Express. No one on the server but me. I am running each test 20 times and taking the average.

I also removed the sum function so all that is left is the skeleton code for the cursor vs. the while loop.

I ran the scenario several time over and the while loop always won out if I specified the entire recordset, but if I specified Top X then the Cursor always won out. The Top X was bugging me so I figured lets do this real world where you normally don't do a Top X. So I started specifying ranges of IDs using a between clause of differing sizes

Here are my results against the same recordset I used earlier in the day (65528 rows)

4966 Records

Cursor: 211.35 ms

While: 175.25 ms

9926 Records

Cursor: 394.85 ms

While: 326.15 ms

19892 Records

Cursor: 829.85 ms

While: 694.37 ms

33527 Records

Cursor: 1291.75 ms

While: 1182.4 ms

65528 (All) Records

Cursor: 2499.80 ms

While: 2342.05 ms

So using more real world scenarios, the while always seems to win out though not by much.

Jeff, does this seem right to you?

Jeff Moden SSC Guru Points: 1004421 More actions · Answer 11

Ok... lemme demonstrate a couple of things... first, we need some data that everyone

can use and we need lot's of it because these tests run so darned fast. Instead of

using TOP in all the code examples, change the TOP in the following data generator

code to match the size of the test you want... yep, I've included some extra columns

because this is really a copy of my standard data generator and I thought I'd share it,

as well... usually takes something less than 90 seconds to build a million rows. Details

are where they belong... in comments in the code! 😉

drop table jbmtest

--===== Create and populate a 1,000,000 row test table.

-- Column "RowNum" has a range of 1 to 100,000 unique numbers

-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

-- for all rows.

-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

-- Jeff Moden

SELECT TOP 1000000 --<<< LOOK! Change this number to change the size of the test

SomeID = IDENTITY(INT,1,1),

SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),

SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

SomeHex12 = RIGHT(NEWID(),12)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns t1,

Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

--===== A table is not properly formed unless a Primary Key has been assigned

-- Takes about 1 second to execute.

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (SomeID)

In case you didn't notice, we just eliminated a cursor or a while loop by using a

constrained cross-join! Try building the same thing using a WHILE LOOP and see how

long building a million rows of data takes! 😉

Ok... just like before, we're just going to do a sum on a column. Yes, SUM() would be

the way to do it but, just like before, we trying to give the code something very simple

to do. Here's the cursor code... in the spirit of all that is "holy" in SQL, all fat has been

trimmed out of the code... no extra columns are passed in.

--===== Common

DECLARE @TimeStart DATETIME

--===== Cursor

SET @TimeStart = GETDATE()

DECLARE @SomeMoney MONEY,

@SumSomeMoney MONEY

DECLARE curTest CURSOR

FOR

SELECT TOP 100000

SomeMoney

FROM dbo.JbmTest

OPEN curTest

FETCH NEXT FROM curTest

INTO @SomeMoney

SELECT @SumSomeMoney = 0

WHILE @@FETCH_STATUS = 0

BEGIN

SELECT @SumSomeMoney = @SumSomeMoney + ISNULL(@SomeMoney,0)

FETCH NEXT FROM curTest

INTO @SomeMoney

END

CLOSE curTest

DEALLOCATE curTest

SELECT @SumSomeMoney AS Total,

CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration

If you run that over and over for a million rows, you'll get an average close to this

(total will vary because of the random nature of the numbers)...

[font="Courier New"]Total Duration

--------------------- --------------------------

49955839.1400 00:00:36:267[/font]

Let's do the same thing with a Temp table and While loop following the same rules...

--===== Common

DECLARE @TimeStart DATETIME

SET NOCOUNT ON

--===== While Loop

SET @TimeStart = GETDATE()

DECLARE @SumSomeMoney MONEY,

@MyCount INT,

@Counter INT

SELECT IDENTITY(INT,1,1) AS RowNum,

SomeMoney

INTO #MyHead

FROM dbo.JbmTest

SELECT @MyCount = @@ROWCOUNT

ALTER TABLE #MyHead

ADD PRIMARY KEY CLUSTERED (RowNUM) WITH FILLFACTOR = 100

SELECT @SumSomeMoney = 0,

@Counter = 1

WHILE @Counter <= @MyCount

BEGIN

SELECT @SumSomeMoney = @SumSomeMoney + ISNULL(SomeMoney,0)

FROM #MyHead

WHERE RowNum = @Counter

SELECT @Counter = @Counter + 1

END

DROP TABLE #MyHead

SELECT @SumSomeMoney AS Total,

CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration

[font="Courier New"]Total Duration

--------------------- --------------------------

49955839.1400 00:00:30:453[/font]

Oh my goodness! The Temp Table and While Loop method ran a good 6 seconds

faster! (Don't even think of running it without that primary key unless you also like

watching grass grow...). So THAT's why everyone says a WHILE LOOP is faster that a

CURSOR...

Well, it's just not true folks. Run the following cursor code... the ONLY thing I've

changed is in RED...

--===== Common

DECLARE @TimeStart DATETIME

SET NOCOUNT ON

--===== Cursor

SET @TimeStart = GETDATE()

DECLARE @SomeMoney MONEY,

@SumSomeMoney MONEY

DECLARE curTest CURSOR FAST_FORWARD --"Firehose" cursor

FOR

SELECT SomeMoney

FROM dbo.JbmTest

OPEN curTest

FETCH NEXT FROM curTest

INTO @SomeMoney

SELECT @SumSomeMoney = 0

WHILE @@FETCH_STATUS = 0

BEGIN

SELECT @SumSomeMoney = @SumSomeMoney + ISNULL(@SomeMoney,0)

FETCH NEXT FROM curTest

INTO @SomeMoney

END

CLOSE curTest

DEALLOCATE curTest

SELECT @SumSomeMoney AS Total,

CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration

[font="Courier New"]Total Duration

--------------------- --------------------------

49955839.1400 00:00:31:810[/font]

Well, wadaya know... almost as fast as the While loop. And a lot less effort.

Here's the timings I got for all 3 methods across a range of row counts... it was done

on a single user server with it's own hard disks... nothing is shared...

[font="Courier New"]Rows Cursor While Loop FH Cursor

10 0.000 0 - 0.017 0.000

100 0.000 0.017 0.000

1000 0.033 0.063 0.033

10000 0.390 0.343 0.297

100000 3.563 3.327 3.203

1000000 36.267 30.453 31.810[/font]

Notice that both cursor types are faster up to the 1000 mark and that the Fire hose

cursor is faster than the While Loop all the way up past 100,000 rows...

In other words, from these simple tests which measure really base performance of the

3 methods, converting Cursors into While Loops is a bit like attending a charity orgy to

support virginity.

Oh, yeah... let's do the set based equivalent and see what happens...

[font="Courier New"]Rows Cursor While Loop FH Cursor Set Based

10 0.000 0 - 0.017 0.000 0.000

100 0.000 0.017 0.000 0.000

1000 0.033 0.063 0.033 0.000

10000 0.390 0.343 0.297 0.000

100000 3.563 3.327 3.203 0.063

1000000 36.267 30.453 31.810 0.673[/font]

I'm thinking that we're pretty much destroyed the myths the either the cursor or the

temp table with a while loop is faster than the other. They both suck almost equally

bad especially when the number of rows starts to get up there a bit.

Oh, yeah... what's the set based solution that I used? Why, the basis of all intelligent

SQL programming... we don't normally use a variable as the target of the code, but here it is...

--===== Set Based "Loop"

DECLARE @TimeStart DATETIME

SET @TimeStart = GETDATE()

DECLARE @SomeMoney MONEY,

@SumSomeMoney MONEY

SELECT @SumSomeMoney = 0

SELECT @SumSomeMoney = @SumSomeMoney + ISNULL(SomeMoney,0)

FROM dbo.jbmTest

SELECT @SumSomeMoney AS Total,

CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration

Between the "Set Based 'Loop'", the constrained Cross Join, and the Tally table, you

can get around the need for just about any form of RBAR you can think of. Even a

simple UPDATE can be used to take the place of many forms of RBAR. Take the

necessarily procedural code for doing a running total... if you think that the only way

to do one is with a cursor or while loop or you've seen the supposedly set based

but performance challenged methods that use a triangular join, then you haven't seen

the following article...

http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

Now THAT's how you eliminate a cursor! 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004421 More actions · Answer 12

Kevin Rathgeber (6/13/2008)
So using more real world scenarios, the while always seems to win out though not by much.
Jeff, does this seem right to you?

I'd pretty much agree depending on the machine you're using. As you just saw in the testing I did, they both take turns winning and one never wins over the other by very much.

There are rare exceptions, but, for the most part, RBAR is a virtual guarantee of poor performance.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Kevin Rathgeber SSC Veteran Points: 270 More actions · Answer 13

Thanks Jeff for that excellent in depth explanation.

As I explained to you privately my own personal brain fart made me miss an important piece of working with sets. Understanding that blunder, now when I think of it, it is a no brainer.

Thanks again.

Jeff Moden SSC Guru Points: 1004421 More actions · Answer 14

Thanks for the feedback, Kevin... and thanks for the code you posted. There are only a few people who will take the time to actually do performance tests never mind posting the results and the code they used. Very happy to have another one of the "good guys" join the forum!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 15

You guys REALLY need to listen up when TheSQLGuru speaks!! 😀

1) The first post I did on this thread way back requested someone test the result with the fastest cursor definition possible: DECLARE curTest CURSOR FORWARD_ONLY READ_ONLY STATIC LOCAL. No one did that, despite Jeff running quite the suite of tests using FAST_FORWARD - which is now proven to NOT be the fastest cursor method available. See here: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/21/curious-cursor-optimization-options.aspx.

On my laptop the above declaration for the cursor came in at 14.6 seconds, the fast_forward cursor was 17.0 sec avg with the temp table/while loop construct taking 14.8 sec avg. So my cursor still beats the while loop, even at 1M rows.

2) it is just plain silly in this case to put the data into a temp table to iterate the while loop over. Just use SomeID as the driver value. This code ran in 10.9 sec avg, or almost 1/3 faster:

DECLARE @TimeStart DATETIME

SET NOCOUNT ON

--===== While Loop

SET @TimeStart = GETDATE()

DECLARE @SumSomeMoney MONEY,

@MyCount INT,

@Counter INT

SELECT @MyCount = count(*) from dbo.JbmTest

SELECT @SumSomeMoney = 0,

@Counter = 1

WHILE @Counter <= @MyCount

BEGIN

SELECT @SumSomeMoney = @SumSomeMoney + ISNULL(SomeMoney,0)

FROM dbo.JbmTest

WHERE SomeID = @Counter

SELECT @Counter = @Counter + 1

END

SELECT @SumSomeMoney AS Total,

CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration

3) I also tried a TOP 1 implementation which I got to run in 11.5 sec avg - still a good bit faster than the cursors but slower than direct access. Note that I didn't debug this just slapped it together so not sure if it works right. Gotta run take care of baby!

DECLARE @TimeStart DATETIME

SET NOCOUNT ON

--===== While Loop

SET @TimeStart = GETDATE()

DECLARE @SumSomeMoney MONEY, @currentmoney money,

@currentrow INT

SELECT @SumSomeMoney = 0

SELECT top 1 @currentrow = someid,

@currentmoney = somemoney

from dbo.JbmTest

order by someid

WHILE @@ROWCOUNT > 0

BEGIN

SET @SumSomeMoney = @SumSomeMoney + ISNULL(@currentmoney,0)

SELECT top 1 @currentrow = someid,

@currentmoney = somemoney

from dbo.JbmTest

WHERE SomeID > @currentrow

order by someid

END

SELECT @SumSomeMoney AS Total,

CONVERT(VARCHAR(26),GETDATE()-@TimeStart,114) AS Duration

4) And if you REALLY want row-by-row stuff to be fastest, you need to step outside TSQL altogether. CLR RBAR blows the doors off of native cursors. http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/28/poor-men-see-sharp-ndash-more-cursor-optimization.aspx

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service