There Must Be 15 Ways To Lose Your Cursors... part 1, Introduction

Question

There Must Be 15 Ways To Lose Your Cursors... part 1, Introduction

Viewing 15 posts - 181 through 195 (of 380 total)

You must be logged in to reply to this topic. Login to reply

RBarryYoung SSC Guru Points: 143329 More actions · Answer 1

katedgrt (4/15/2009)
RBarryYoung (4/14/2009)
... (Note: Wikipedia is using revisionist terminology that would instead call this "imperative programming", but I like the old terms better).
I haven't heard the term revisionist terminology before but I have a need for it, and I dig your use of it here. It applies in so many situations these days, my resume included. Although I was in the Data Integration department at the Big Bank for years, my resume now reflects that work as ETL. Same concept, different century.

Well, now I have to confess, my use of "revisionist" is probably a little bit too strong. :blush: The problem is the only other description I could think of ("latter-day" terminology) is not nearly strong enough.

You see, "revisionist" implies an attempt to rewrite history, and that is not what really happened AFAIK, though there was (and is) some ulterior motives in re-casting traditional terms. Usually it's that someone(s) either wants to reuse the better, older words for their own initiatives or theories or they want try to recast peoples understanding of the distinction between different "classes" of things.

In this case, I believe that the OO (object-oriented) proponents of the mid-90's wanted to rebrand the object-oriented sub-paradigm as something completely different from the more traditional 3GL's like Fortran, COBOL, etc. Now the distinguishing term between these two groups of Procedural languages (languages that are written in the form of a procedure) is that the newer ones were "Object-Method Oriented" meaning that they created and used methods in objects, whereas the older ones were "Procedure-Oriented" meaning that they created and used procedures without objects (a method is just a procedure in an object). Notice the subtle but real difference between "Procedural" and "Procedure Oriented".

Now the OO folks had tried to take both 5GL and 6GL as designations earlier, but that never really went anywhere. So instead they decided to demarcate the new languages as "not Procedural languages", though they were actually Procedural languages that were object-oriented. Fast forward a few years and the distinctions of the previous paradigms has been lost and now folks have to use a new term for the larger group of non-declarative languages, so they are now called "imperative". Yuck.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 2

Andy DBA (4/15/2009)
I apologize if this is somewhat off topic and there's probably tons of articles on it already, but here is a word to the wise on performance testing. I noticed GSquared and others using GetDate() and DateDiff to capture execution times. I call this "wall-clock benchmarking". If you are the only user on your server and/or are taking averages of multiple tests your comparisons may be pretty good, but any other processes running on your server (including the OS!) can throw your results way off :w00t:.
Someone with good expertise on the guts of SQL Server please feel free to jump in here, but I highly recommend querying master.dbo.sysprocesses with the @@spid system variable to get before and after resource usage values and then taking the average of multiple iterations. (see code for one iteration below) Also, don't forget about execution plan caching. Depending on what you're testing, you may want to throw out your first set of results.
Here's the sql I suggest using to capture cpu usage and i/o. I think BOL explains exactly what these values mean, but for A/B comparisons on the same machine, the raw values are usually good enough.
declare @start_cpu int
declare @end_cpu int
declare @start_io int
declare @end_io int
select @start_cpu = sum(cpu), @start_io = sum(physical_io) from master.dbo.sysprocesses where spid = @@spid
/* Insert SQL to be performance tested here */
select @end_cpu = sum(cpu), @end_io = sum(physical_io) from master.dbo.sysprocesses where spid = @@spid
select @end_cpu - @start_cpu as cpu_used, @end_io - @start_io as io_used
--Note: aggregation is probably not necessary, but if you're looking at a different spid, sysprocesses can sometimes returns multiple rows.

I absolutely agree... Delta-T's using GETDATE() are only a surface indication that something may be going right or wrong. The reason why many use it on this forum is because it's so simple. I'll frequently use SET STATISTICS IO ON an SET STATISTICS TIME ON because they actually show more information than what a dip into SysProcesses does. Of course, on any looping code, those are just totally ineffective. I'm slowly but surely getting into the habit of having Profiler running for batch completions (all columns selected) with a filter on the particular SPID I happen to be testing on/with.

The other thing is that sometimes CPU time isn't enough... it's also very necessary to see what the I/O activity is either from a cache standpoint or actual hard disk access.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 3

Thomas (4/15/2009)
However, I have, in my travels, run into situations where even though a set-based solution existed, it performed worse than a cursor in that particular version of the DBMS for that particular problem. I suppose it is akin to denormalizing. You have to know the reasons for normalizing and be versed in its use before you can consciously decide to deviate for a particular solution.

Absolutely correct... except that most of those "set-based" solutions that perform worse than a cursor aren't actually set-based. They just look like it. Just because it has no While Loop or explicit RBAR, doesn't mean it's set based. Those poor performing solutions are giving "good" set-based code a very bad reputation. One of the more common and "deadly" reasons for such poor performance can be found in the following article...

http://www.sqlservercentral.com/articles/T-SQL/61539/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 4

ZA_Crafty (4/15/2009)
gautamsheth2000 (4/13/2009)
Should use this code 🙂
Select count(*) From master.sys.columns C1 CROSS JOIN master.sys.columns C2
You can do even one better.
using the PK in count is faster than *
ie,
Select count(id) From master.sys.columns C1 CROSS JOIN master.sys.columns C2

Heh... I'm thinking that no one has actually tried the second piece of code. It produces an error.

Also, while SELECT * can be deadly to performance, it's a myth that counting a PK column is faster than counting using COUNT(*). But don't take my word for it... test it. Since Master.sys.columns is actually a view and not a table, let's build our own million row table and see what happens...

[font="Courier New"]--===== Create and populate a 1,000,000 row test table.

-- Column "RowNum" has a range of 1 to 100,000 unique numbers

-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

-- for all rows.

-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

SELECT TOP 1000000

SomeID = IDENTITY(INT,1,1),

SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),

SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

SomeHex12 = RIGHT(NEWID(),12)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns t1,

Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

--===== A table is not properly formed unless a Primary Key has been assigned

-- Takes about 1 second to execute.

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (SomeID)[/font]

And, then lets test the two counts. I recommend running these more than once because both are very fast and casual system activity can make it look like either is superior for any given run...

[font="Courier New"] SET STATISTICS TIME ON

SELECT COUNT(*) FROM dbo.JBMTest

SELECT COUNT(SomeID) FROM dbo.JBMTest

SET STATISTICS TIME OFF[/font]

If you run that code with the actual execution plan turned on, you'll also see that both pieces of code do a clustered index scan... just in case no one knows what that really means, it's the same thing as a slightly more intelligent table scan because that's where the clustered index lives.

There is no advantage to counting a given column compared to counting using "*". There may, however, be a huge disadvantage if you pick the wrong column. If you pick the PK column, then COUNT(pkcolumn) and COUNT(*) are functionally equivalent because the PK will not allow nulls. If, however, you pick a column that does allow nulls, the answer can be very different...

[font="Courier New"] SELECT COUNT(*),COUNT(SomeCSV) FROM dbo.JBMTest

UPDATE dbo.JbmTest

SET SomeCSV = NULL

SELECT COUNT(*),COUNT(SomeCSV) FROM dbo.JBMTest[/font]

My recommendation is that if you want to use COUNT to determine the total number of rows in a table, then use COUNT(*) and not COUNT(columnname).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 5

dejan.kelemen (4/15/2009)
I'm all for declarative and rarely or almost never use while loops or cursors, but how would I go about doing this without while loop...btw I'm writing this in a hurry, so please go easy on me 🙂
create table ConditionTable
...

Let me know how this works:

create table ConditionTable(

ConditionTableKey int identity(1,1) primary key not null,

Condition1 bit,

Condition2 bit,

Condition3 datetime)

create table DataTable(

DataTableKey int identity(1,1) primary key not null,

DataName varchar (20),

Canceled bit,

Active bit,

SysDate datetime)

insert into ConditionTable (Condition1, Condition2, Condition3 )

values ( 1, 1, '20090425' )

insert into ConditionTable (Condition1, Condition2, Condition3 )

values ( null, 1, '20090426' )

insert into ConditionTable (Condition1, Condition2, Condition3 )

values ( 0, 1, '20090427' )

insert into ConditionTable (Condition1, Condition2, Condition3 )

values ( 1, 1, '20090427' )