Cursors Be Gone!

  • It's been interesting actually - this thread has really reminded me that it's better to ask about intent rather than mechanic. By trying to ask 'how do I achieve X' (e.g. get rid of a cursor) we haven't had much useful to say. But by taking a step back and saying 'how do I approach this problem' - we've had some nice solutions and learnt a lot in the process (well, I did anyway!).

    edit -> hmm sorry this was supposed to be a quote off jeff's last post. whoops.

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Matt Whitfield (1/3/2009)


    Jeff Moden (1/3/2009)


    Nope... unless you changed it for your server, the default Fill Factor is 90.

    Are you sure? On all the servers i've found, i get this result set for sp_configure 'fill factor'

    name,minimum,maximum,config_value,run_value

    fill factor (%),0,100,0,0

    http://msdn.microsoft.com/en-us/library/ms177459.aspx says 0 and 100 are the same in all respects - and http://msdn.microsoft.com/en-us/library/ms191005(SQL.90).aspx says the default is 0. Interestingly though, http://msdn.microsoft.com/en-us/library/ms191005.aspx (the sql 2008 version) omits the part about the default text...

    Well, I'll be... hang on a minute while I get this egg off my face {scrub, scrub, scrub, wipe, wipe, scrub, blows nose, snorts, coughs up an eggshell and a chicken's foot, wipes again}... that's better...

    Somewhere along the line, and it might have been at the very beginning of my exposure to SQL Server, I learned that the default was 90. Every server that I've ever been on that was setup by someone else has always been set to 90. In all the time I've been at this, I've never checked to see what a new server that I setup was set at... I just assumed (we know what happens there!) that it would be 90 and never gave it a second thought... until now.

    Lordy, I love it when I learn something new (although I think I feel another chicken's foot coming up)! This really explains some stuff... I've gotta do some checking...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/3/2009)


    Well, I'll be... ...snip... I've gotta do some checking...

    I'm glad i'm not the only one doing all the learning! 😛

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Jeff Moden (1/3/2009)


    Matt Whitfield (1/3/2009)


    Jeff

    You are intensely cool. 1.796. And I'll agree with what was said above - i am most impressed by the way you got the data out of that table. Ultimate respect.

    One thing though - being the sneaky chap that I am, I replaced the grab from the table with your code in my solution - and ended up with the following. This one I got to run in 1.046. Can you check it on your box and let me know?

    On my box, I got a little over 6 seconds with the clustered index on the event's table. So, on my box, your code took 2 seconds longer and on your box, my code took longer.

    Phil Factor and I have run into that a couple of times together. He and I wrote a split function that would split the novel Moby Dick at the word level. His code took relatively long on my box and my code absolutely smoked his. The exact opposite was true on his box. That's why "tuning" is such a "dark art"... there is no real black and white way of doing it. You have to "feel the machine" and "use the Force"... 😛

    I am fairly certain that one of the main causes of this variation is parallelism. Most of the laptops that we test on have 1 or 2 cores, whereas most of the servers that we test on have 4 to 8 cores. As most of the "million-row" test that Jeff likes have a lot of bulk to them, the optimizer frequently promotes parallel steps. Thats when that big difference in the number of cores can result in a huge variation in performance for some of the solutions (the ones that can be parallelized), but NOT for the others. We could set MAXDOP=1 to prevent this, but that opens up the question of how fair/realistic that is for bulk procs.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • My poor ol' (but very reliable) desktop has a single 1.8 Ghz P4... it's not even a P5. But, like I said earlier, when I get something to run fast on it, it wails on other machines... well, all except for Phil's machine 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Haroon Said (12/24/2008)


    Just for fun if you had a banking application with millions of account records in a table and had to calculate some complex interest every day on the balance based upon a account holder billing date which would be better, cursor or set based operation?

    So... just for fun... you've seen what just happened... Someone said a task was "impossible" to not use a cursor or while loop on. A couple of us came up with multiple methods that not only solved the supposedly impossible problem, but did it with such performance that most people would have thought that it didn't run at all...

    ... my question to you is the same as you've asked... "which would be better, cursor or set based operation?"

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Gaby A. (12/23/2008)


    Comments posted to this topic are about the item

    Heh... "Cursors Be Gone", indeed... see what you started? 😉 But, do try to pick a better example in the future, eh? 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • battelofhalfwits (1/2/2009)


    You seem pretty sure. Have you tested it?

    Tested / Worked with = Yes

    After all of that, you've just gotta know what's coming...

    ... prove it. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • hfxDBA (12/24/2008)


    I've always resisted responding to articles with obvious errors simply not wanting to get into arguments about which way to skin the cat. I hope people who read the original articles here also read all the comments and replies. I hear and see too many junior (and some not so junior) read articles then start using the techniques as THE way to do something.

    I can spot several things I would change with a lot of the reply scripts. I won't list those right now because I think the most important thing for readers who visit here to know is that they MUST try different techniques and record the results so that they know which technique solves their problem best for them. They must decide on what trade-offs they can risk and which they can't.

    Whenever I write an SP, USP, script, or even a TRIGGER, I comment exactly what I did and also comment exactly what alternate methods I used and why I chose the solution I did (for instance, why I chose a table variable over a temp table (or vice versa) or why I chose a cursor over a while loop). If you have several developers or DBAs who could be maintaining the code, this will help them understand your methods and may save them some time down the road.

    With all the stuff that went on in this thread, I'm sorry to say that I missed your good post. Hopefully you never have to document why you may have used a cursor over a while loop or vice versa :P, but I absolutely agree with the rest of what you've said... especially about documenting the code with the reasons [font="Arial Black"]"WHY" [/font]something was done. Any moron can figure out the "how" by reading the code (although I document the "How" of "trick" code, as well)... the "WHY" is usually much more important.

    You also have to understand that when say things like "I can spot several things I would change with a lot of the reply scripts", people are going to want you to follow that up with some code and some of that fine documentation you were just talking about. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/3/2009)


    jacroberts (1/3/2009)


    This method sounds like a bit of a dark art.

    Oh, it absolutely is... one that is condemned by many "experts" who have yet to get it to break. But one extra pinch of "Sand People Toe Nail" or leave out even 1 hair of a Jedi eye-lid, and it will burn you. However, do it all right and absolutely nothing can touch it for performance and it'll never break in the face of changing data.

    This is an incorrect statement.

    Partitioned tables CAN cause this UPDATE @var... 'trick' to give incorrect values, so there are cases where it will break. See here, the 2/1/2008 3:08am post from ALZDBA: http://www.sqlservercentral.com/Forums/Topic449802-203-2.aspx

    Parallel Operations CAN cause incorrect results too:

    http://www.sqlservercentral.com/Forums/Topic449802-203-3.aspx 2/6/2008 2:55pm post.

    I proposed those as possible points of issue, as well as Enterprise Edition merry-go-round scans as well. Testing proved the first two concerns but the EE advanced scan one didn't come up as a 'give bad results' point but that still doesn't mean it won't break the query. Not making something happen in testing of this nature does NOT guarantee the assertion. Only documentation from Microsoft can do that.

    Bottom line: DO NOT USE THIS UPDATE @VAR ... TRICK UNLESS YOU a) DON'T CARE ABOUT GETTING INCORRECT RESULTS OR b) YOU ARE VERY KNOWLEDGEABLE ABOUT YOUR SYSTEM, DATA AND THE PITFALLS OF THIS TRICK AND HOW TO HOPEFULLY WORK AROUND THEM.

    I highly encourage all who read this thread to FULLY read that entire thread. It was one of the biggest ever on this site I think and contains a wealth of information.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/4/2009)


    Bottom line: DO NOT USE THIS UPDATE @VAR ... TRICK UNLESS YOU a) DON'T CARE ABOUT GETTING INCORRECT RESULTS OR b) YOU ARE VERY KNOWLEDGEABLE ABOUT YOUR SYSTEM, DATA AND THE PITFALLS OF THIS TRICK AND HOW TO HOPEFULLY WORK AROUND THEM.

    Isn't that like saying 'don't take your dad's new car rallying unless you really know how to drive properly?'. They don't tend to teach handbrake turns, drifting & powerslides etc in a driving test, but they are certainly very effective ways to get cars to go quickly.

    I certainly agree that care must be taken with undocumented features - but in a movie where you needed to save the earth with a lightning fast query, perhaps this would be the one! 😛

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • My rule of thumb for removing cursors is to try to replace them with a set based operation if I can.

    I would rather use several set based queries and then have one small cursor to deal with the remainder rows that the set based could not handle then to complicate queries with temporary tables.

    I know a lot of people hate cursors however with the speed and power of database servers today, I think they still have their place and always will. I, personally, have never had a problem with them.

    I also work on tables which contain millions of rows.

  • The first bit of testing is finished. Firstly I am just testing the different initial queries to get the data. So no results from the cursors yet. I've had to change the code a little as some of the [cs(Cookie)] values don't have any SessionId in them so there is a 'LIKE' statement in the sql.

    Results at the bottom.

    Here's the code:

    CREATE PROCEDURE TestSiteVisitSelects

    AS

    BEGIN

    SET NOCOUNT ON

    DECLARE @Date datetime

    SET @Date = '20090104'

    DECLARE @StartTime datetime

    SET @StartTime = GetDate()

    DECLARE @ASPString nvarchar(100)

    DECLARE @Len int

    DECLARE @DateMinus1 datetime

    DECLARE @ElevenThirty datetime

    DECLARE @Count int

    SET @ElevenThirty = '1899-12-30 23:30:00.000'

    SET @ASPString = 'ASP.NET_SessionId='

    SET @Len = Len(@ASPString)

    SET @Date = Convert(varchar, @Date, 112)

    SET @DateMinus1 = DateAdd(dd, -1, @Date)

    SELECT @Count = Count(*)

    FROM

    (

    SELECT datediff(minute, '18991230', [Time]) as EventMinuteIndex,

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE Date = @date

    AND [cs(Cookie)] LIKE '%ASP.NET_SessionId=%'

    UNION ALL

    SELECT datediff(minute, '18991230', [Time]) - 1440 as EventMinuteIndex,

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE Date = @DateMinus1 AND [Time] >= @ElevenThirty

    AND [cs(Cookie)] LIKE '%ASP.NET_SessionId=%'

    ) itbl

    PRINT 'Selected UNION ALL ' + Cast(DateDiff(ss,@StartTime, GetDate()) as varchar(10))

    + ' secs, ' + Cast(@Count AS varchar(10)) + ' rows selected'

    SET @StartTime = GetDate()

    SELECT @Count = Count(*)

    FROM

    (

    SELECT datediff(minute, '18991230', [Time]) as EventMinuteIndex,

    CASE

    WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

    SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

    + @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

    - CharIndex(@ASPString, [cs(Cookie)])- @Len)

    ELSE

    SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

    END ASPSessionId

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE ([date] = @date

    OR ([date] = @DateMinus1

    AND [Time] >= @ElevenThirty)

    )

    AND [cs(Cookie)] LIKE '%ASP.NET_SessionId=%'

    ) itbl

    PRINT 'Selected no UNION ALL ' + Cast(DateDiff(ss,@StartTime, GetDate()) as varchar(10))

    + ' secs, ' + Cast(@Count AS varchar(10)) + ' rows selected'

    SET @StartTime = GetDate()

    SELECT @Count = Count(*)

    FROM

    (

    SELECT CAST(SUBSTRING(Cookie,1,CHARINDEX(';',Cookie)-1) AS VARCHAR(50)) AS ASPSessionID,

    datediff(minute, @date, TheDate+2.0) AS EventMinuteIndex

    FROM (--==== Subquery does some of the calcs and isolates the start of the SessionID

    SELECT Date+Time AS TheDate,

    SUBSTRING([cs(Cookie)],CHARINDEX('ASP.NET_SessionId=',[cs(Cookie)])+18,8000)+';' AS Cookie,

    Date

    FROM dbo.WebLogEvents WITH (NOLOCK)

    WHERE Date = @Date

    OR (Date = @Date-1 AND Time >= '1899-12-30 23:30')

    AND [cs(Cookie)] LIKE '%ASP.NET_SessionId=%'

    )p1

    ) itbl

    PRINT 'Selected from select ' + Cast(DateDiff(ss,@StartTime, GetDate()) as varchar(10))

    + ' secs, ' + Cast(@Count AS varchar(10)) + ' rows selected'

    END

    GO

    When the date is set to a date that does not exist on the dateabase these are the times for the 3 queries:

    Selected UNION ALL 1348 secs, 0 rows selected

    Selected no UNION ALL 795 secs, 0 rows selected

    Selected from select 717 secs, 0 rows selected

    With data:

    Selected UNION ALL 1449 secs, 9274513 rows selected

    Selected no UNION ALL 834 secs, 9274513 rows selected

    Selected from select 739 secs, 10042133 rows selected

    There is an error in the last query as it has returned more rows.

    This is because a pair of brackets are missing form the where clause.

    The 'WHERE' clause should be:WHERE (Date = @Date

    OR (Date = @Date-1 AND Time >= '1899-12-30 23:30'))

    AND [cs(Cookie)] LIKE '%ASP.NET_SessionId=%'I'll test it again tomorrow with the code fixed.

    The sql: "select count([cs(cookie)]) from weblogevents" takes 673 secs.

    I'll test the different procedures tomorrow.

  • Looks like code blocks present smileys instead of the plain text. How can we get the sqlservercentral.com web-gurus to change that 'feature'?

    jacroberts, I imagine this query may be succeptible to parameter sniffing due to the where clause. Well, that assumes there is an index that is useful here and that a table scan isn't happening for all/most queries.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/5/2009)


    jacroberts, I imagine this query may be succeptible to parameter sniffing due to the where clause. Well, that assumes there is an index that is useful here and that a table scan isn't happening for all/most queries.

    There are no indexes useful to these queries on the WeblogEvents table. All queries do a full table scan and it looks like the UNION ALL one does two.

Viewing 15 posts - 151 through 165 (of 272 total)

You must be logged in to reply to this topic. Login to reply