Can this be done without cursors?

  • I had wondered about that too. Bet Itzik has something up his immense sleeve to super-charge this particular puzzle. 🙂

    Hey, what the heck are you doing playing on the forum when you are on vacation?? WHAT A GEEK!! A man after my own heart! LOL

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (11/26/2007)


    I had wondered about that too. Bet Itzik has something up his immense sleeve to super-charge this particular puzzle. 🙂

    Hey, what the heck are you doing playing on the forum when you are on vacation?? WHAT A GEEK!! A man after my own heart! LOL

    BWAHAHAHAHAHAHAHAHA (I know the feeling....)

    sounds to me like Jeff's gone and hacked SQL Server directly onto his cell phone..... It's just that the keyboard is so %$#$@@#$%^ small....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Heh... yeah, I know... "Bus-man's holiday" 😛 The really wide code is pretty tough on the display...

    Yeah... I'm a "geek"... love this stuff. Some folks do counted cross-stich for fun... I do countless cross-joins, instead :w00t:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I had wondered about that too. Bet Itzik has something up his immense sleeve to super-charge this particular puzzle.

    Heh... I'll just bet that Jeff Moden does, too! 😛

    But, let's not test on a mere 7 rows... let's test on a million rows with an average of 20 people in each group...

    --===== Create the temporary test table (NOT PART OF THE SOLUTION)

    IF OBJECT_ID('TempDb..#Scores','U') IS NOT NULL

    DROP TABLE #Scores

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    PersonID = ISNULL(ABS(CHECKSUM(NEWID()))%50000+1,0),

    Grp = ISNULL(ABS(CHECKSUM(NEWID()))%50000+1,0),

    Score = ABS(CHECKSUM(NEWID()))%100+1

    INTO dbo.#Scores

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== Delete any dupes that may have occurred (expect about 200)

    DELETE t1

    FROM dbo.#Scores t1,

    dbo.#Scores t2

    WHERE t1.PersonID = t2.PersonID

    AND t1.Grp = t2.Grp

    AND t1.RowNum < t2.RowNum

    --===== Add same primary key as original test data

    ALTER TABLE dbo.#Scores

    ADD PRIMARY KEY CLUSTERED (PersonID,Grp)

    And, since there are 5 types of ranking according to Wiki-pedia... let's do the 3 most common (Ref: http://en.wikipedia.org/wiki/Ranking). Code deleted due to minor error... please see corrected code below...

    Dunno how long it takes on your machine, but it takes 28 seconds on mine. Not bad, huh?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Was reviewing the results of the code above, and found a small boo-boo on the "Dense Percentile"... corrected in the following code... it did take it up from 28 to 35 seconds on the million row example... but it is calculating 3 different Ranks and the associated Percentile 😛

    --===== If the working table already exists, drop it...

    IF OBJECT_ID('TempDb..#Working','U') IS NOT NULL

    DROP TABLE #Working

    --===== Create and populate the working table on the fly

    -- using the original data... makes a couple extra

    -- columns to hold the Rank and Percentile values.

    SELECT PersonID,

    Grp,

    Score,

    ORank = CAST(NULL AS INT), --"Standard" Rank

    OPercentile = CAST(NULL AS DECIMAL(3,2)),

    CRank = CAST(NULL AS INT), --"Competative" Rank

    CPercentile = CAST(NULL AS DECIMAL(3,2)),

    DRank = CAST(NULL AS INT), --"Dense" Rank

    DPercentile = CAST(NULL AS DECIMAL(3,2))

    INTO #Working

    FROM #Scores

    --===== And the "magic" index that will control the sort order of

    -- the whole process (doesn't need to be clustered, but is

    -- lot's faster if it is).

    CREATE CLUSTERED INDEX IXC_tmpWorking

    ON #Working (Grp ASC, Score DESC) WITH FILLFACTOR = 100

    --===== This next part has to be all dynamic SQL because the index above

    -- does not exist at compile time and we'd get an error about the

    -- index missing.

    DECLARE @sql VARCHAR(8000)

    SET @sql = '

    --===== Declare a couple of variables and set them to values we know are

    -- not in the table. Names make them self-documenting.

    DECLARE @CRank INT,

    @DRank INT,

    @ORank INT,

    @PrevGrp INT,

    @PrevScore INT

    SELECT @CRank = -1,

    @DRank = -1,

    @ORank = -1,

    @PrevGrp = -1,

    @PrevScore = -1

    --===== This is the code that does all of the work. It uses SQL Server''s

    -- very proprietary form of update. The order of the update is

    -- controlled by the WITH (INDEX()) directive. Note, these are NOT

    -- mere "hints" even though they are listed like that in BOL... they

    -- are directives that "Instructs SQL Server to use the specified

    -- indexes for a table." They even come with a warning that specifying

    -- an index overrides the optimizer and may cause the query to work

    -- more slowly because the index used may not be optimal for the query.

    -- Not so in this case because... heh, we know what we''re doing ;-)

    UPDATE #Working

    SET @ORank = ORank = CASE

    WHEN w.Grp = @PrevGrp

    THEN @ORank + 1

    ELSE 1

    END,

    OPercentile = (@ORank+0.0)/gc.GrpCount,

    @CRank = CRank = CASE -- Grp same, Score changed, inc by 1

    WHEN w.Grp = @PrevGrp

    AND w.Score <> @PrevScore

    THEN @ORank

    -- Grp same, Score same, same rank

    WHEN w.Grp = @PrevGrp

    AND w.Score = @PrevScore

    THEN @CRank

    -- Grp changed

    ELSE 1

    END,

    CPercentile = (@CRank+0.0)/gc.GrpCount,

    @DRank = DRank = CASE -- Grp same, Score changed, inc by 1

    WHEN w.Grp = @PrevGrp

    AND w.Score <> @PrevScore

    THEN @DRank + 1

    -- Grp same, Score same, same rank

    WHEN w.Grp = @PrevGrp

    AND w.Score = @PrevScore

    THEN @DRank

    -- Grp changed, start Rank over

    ELSE 1

    END,

    DPercentile = (@DRank+0.0)/gcs.ScoreCount,

    @PrevGrp = w.Grp,

    @PrevScore = w.Score

    FROM #Working w WITH (INDEX(IXC_tmpWorking),TABLOCKX),

    (--==== Derived table "gc" gets full counts for each group

    SELECT Grp,

    GrpCount = COUNT(*)

    FROM #Working

    GROUP BY Grp

    )gc,

    (--==== Derived table "gcs" gets count of distinct scores for each group

    SELECT Grp,

    ScoreCount = COUNT(DISTINCT Score)

    FROM #Working

    GROUP BY Grp

    )gcs

    WHERE gc.Grp = w.Grp

    AND gcs.Grp = w.Grp'

    --===== Execute the code above (including the index "hint")

    -- Don't worry... Temp table #Working is "in scope"

    EXEC (@SQL)

    --===== Display the result

    SELECT TOP 100 * FROM #Working ORDER BY Grp,SCORE Desc

    Sorry for the "mistrake" :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You know, you just GOTTA LOVE making single passes through the data. I have used that inline-variable update mechanism in the past to great effect. Almost like the characteristic functions of days gone by. Boy is THAT dating me!! 🙂 Nicely done Jeff - and the documentation provided should help a lot of forumites grok the essence so they can use the techniques shown for other processes!

    My testing shows this code smokes the cursor even at 100K rows (2 sec vs 18 sec), which is what you expect when you are doing single-pass set-based logic.

    Oh, one more thing. Does everyone else have the problem with copying out the 'code block' lines and pasting them in QA and they come out all on one line, unwrapped?? What gives with that?? Can we PLEASE have the forum-gods correct this?? Took me forever to get the code executable, and it still wasn't formatted nicely like Jeff took the time to do!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I guess others have complained about the wrapping thing too. Just saw this very timely post:

    http://www.sqlservercentral.com/articles/SQLServerCentral.com/61520/

    I still think (very strongly) this should be fixed. Can we start a poll or campaign to get this in front of the person/people who can make it happen - or is it maybe technically impossible?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • The quick & dirty way to get the formatted code until the fix is in place is to paste the code into Word (or other editor) and replace the Manual Line Breaks with Paragraph marks - works like a charm!

    ...and another thing... I noticed the use of the object type parameter in the OBJECT_ID() system function in Jeff's code and wondered... It works in SQL2K but does not appear in BOL. It 'seemed' to be new funtionality in SQL2005 and is documented. It sure would be nice to know when/if supported [assumed] functionality like that is available.

  • Oh, one more thing. Does everyone else have the problem with copying out the 'code block' lines and pasting them in QA and they come out all on one line, unwrapped?? What gives with that?? Can we PLEASE have the forum-gods correct this?? Took me forever to get the code executable, and it still wasn't formatted nicely like Jeff took the time to do!

    Better way than what the article showed...

    Put your cursor one line above the code box... click and drag to 1 line below the code box. Copy. That preserves all of the spacing for some reason. Then, do the paste to Word, replace ^l with ^p and copy to Query Analyzer. All formatting will be preserved.

    Boy is THAT dating me!! Nicely done Jeff - and the documentation provided should help a lot of forumites grok the essence so they can use the techniques shown for other processes!

    My testing shows this code smokes the cursor even at 100K rows

    Heh... us old guys rule... 😛 Thank you for the compliment.

    Someday, you're gonna have to tell me your first name.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Kevin Boles. Glad to meet you. You must be Jeff Moden, right?? 😉

    BTW, do you LIVE on this forum or something?? I needed the link to this very thread to pass to someone on the MS forums and did a search for posts you did. Seemed like pages of returns PER DAY!! SHEESH!! 🙂 If I didn't know any better I would swear that answering forum posts was your full-time paid job!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (11/29/2007)


    Kevin Boles. Glad to meet you. You must be Jeff Moden, right?? 😉

    BTW, do you LIVE on this forum or something?? I needed the link to this very thread to pass to someone on the MS forums and did a search for posts you did. Seemed like pages of returns PER DAY!! SHEESH!! 🙂 If I didn't know any better I would swear that answering forum posts was your full-time paid job!

    Yeah... Jeff Moden is correct :hehe:

    Glad to finally "meet" you, Kevin.

    Heh... I'm just addicted 😀 ... kinda of like some people do counted cross-stitch, Sudoku, or Cross-Word puzzles... it requires a fair amount of detail but it's fun and maybe even relaxing...

    ... I think I'll try to convice Steve Jones that I should get a cot with "SQLServerCentral.com" embroidered on it for going over the 5k mark 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply