Can this be done without cursors?

  • I have a ranking problem. I need to calculate rank and percentile for each group of people. Here is a sample data. The task is to rank people within each group based on result. Percentile is a person's rank divided by number of people in a group.

    person    group   result

    person1   1    88 

    person2   1    97 

    person3   1    100

    person4   2    48

    person5   2    50

    person6   2    30

    person7   2    50

    The result should be like the following.

    Notice that if people are tied with the same score, they get the best rank and the next person gets a rank that account for number of people tied with the same score (see persons4's rank and percentile).

    person    group   result rank  percentile

    person1   1    88    3     .1

    person2   1    97    2     .66

    person3   1    100   1    .33

    person4   2    48    3     .75

    person5   2    50    1     .25

    person6   2    30    4      .1

    person7   2    50    1     .25

    I came up with the solution that uses a cursor - for each group. Can this be done without using cursors at all? Temp tables are ok to use.

  • Personally, I would do this sort of calculation at the client front end rather than the database back end but return the records in reverse result order.

    I played around with this at the server end and found that a set based solution worked out rather expensive.

    Basically, I created a table with the 3 existing fields, plus rank, percentile and an identity column.

    I populated the table with my source records sorted in reverse order of result so the ID field acts a bit like a rank.

    Then I found the minimum ID field for each score and populated all rank columns with identical scores with that minimum ID value.

    I then created a looping routine using the rank field to update all records and set the rank to the next sequential number.

    By this time the processing time had exceeded the time taken for a cursor to do the job and I hadn't started on the percentile bit yet

  • In my opinion if you are going to do this server side you will be better of using one cursor than any other technique.

    Steps would be:

    1 create a temp table for final percentiles calculation.

    2. create a select cursor ordered by Group AND result

    3. use a varible to check the group change and reset ranking

    4. for each pass increment rank if greater than previous and insert in temp table (restet when group changes)

    5. compute percentiles set based on the temp table.

    I agree with David in the sense that this is one task in which client side shines but if you must do it on the server at least now you have some pointers in what I consider the right direction

    HTH

     


    * Noel

  • Have you tried table Variables in SQL Server ? Try to use that instead of Cursor. it will improve the performace in comparison to Cursor.

     

    --------------------------------

  • -- No cursors please

    set nocount on

    create table #scores( person char(10) not null, grp int not null, result int not null,

     primary key(person,grp) )

    insert #scores(person,grp,result)

    select 'person1', 1, 88

    union select 'person2',1,97

    union select 'person3',1,100

    union select 'person4',2,48

    union select 'person5',2,50

    union select 'person6',2,30

    union select 'person7',2,50

    select

     person,

     grp,

     result,

     rank,

     percentile = convert(numeric(4,2), cast(rank as float) / people_in_grp )

    from

     ( select

      person,

      grp,

      result,

      (  select count(*) + 1 from #scores s2

       where s1.grp = s2.grp

       and s1.result < s2.result ) as rank,

      (  select count(*) from #scores s2

       where s1.grp = s2.grp ) as people_in_grp

     from

      #scores s1 ) rankings

    order by

     person,

     grp,

     result

    drop table #scores

    set nocount off

    --/rockmoose


    You must unlearn what You have learnt

  • rockmoose,

    It's an excellent solution. I tested it and it seems to be working.

    It took me a little while to figure out what (select count(*) + 1 from #scores s2 where s1.grp = s2.grp and s1.result < s2.result) part is doing.

    Thank you

  • You are welcome ikorenbl!

    Yeah the (select count(*) + 1 from #scores s2 where s1.grp = s2.grp and s1.result < s2.result) is a bit tricky.

    One might be tempted to believe that (select count(*) from #scores s2 where s1.grp = s2.grp and s1.result <= s2.result)

    would be logically equivalent.

    But then the count will be wrong in those cases where there are several persons with the same score in the same group.

    You must unlearn what you have learnt,

    /rockmoose


    You must unlearn what You have learnt

  • hi,

    am new to programming and am doing a project now with asp.net technology..am in need of percentile function to be created...

    and saw your post...

    but could not understand completely.

    wats convert numeric(4,2)

    please let me know at the earliest as it is very urgent...!

  • -- No cursors please

    /rockmoose

    You must unlearn what You have learnt

    Rockmoose! You ol' dog! Where in the heck have you been? How funny... I quoted your tagline just the other day... couldn't remember where I got it from... now, I remember 😀

    Good to "see" you, again.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ajethaa (11/24/2007)


    hi,

    am new to programming and am doing a project now with asp.net technology..am in need of percentile function to be created...

    and saw your post...

    but could not understand completely.

    wats convert numeric(4,2)

    please let me know at the earliest as it is very urgent...!

    "Numeric(4,2)" could also be "Decimal(4,2)"... Rockmoose is just formatting the output to have a max of two digits on either side of the decimal point.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The code RockMoose wrote for solving the Rank/Percentile problem in SQL Server 2000 is a great example of "How to Avoid the Cursor/Loop". Cursor/Loop solution could not be used in a view, if needed... RockMoose' code can be.

    Heh... it's such an easy to understand example that I added my own formatting "style" to it and added it to my "Libary of SQL Tricks". Thought I'd share it with everyone... Thanks, RockMoose... good to see you again.

    /**********************************************************************************************

    Purpose:

    Demonstrate how to calculate "Rank" and "Percentile" in SQL Server 2000 without using a

    Cursor or While Loop. Since the code has neither of those nor uses a temp table to get the

    solution, the solution code could be used in a view.

    Author: RockMoose at SQLServerCentral.com

    Post: http://www.sqlservercentral.com/Forums/Topic124314-8-1.aspx#bm124434

    Date: 24 Nov 2007

    **********************************************************************************************/

    --===== Setup environment

    SET NOCOUNT ON --Supress auto-display of rowcounts for appearance & speed

    --===== Create the temporary test table (NOT PART OF THE SOLUTION)

    IF OBJECT_ID('TempDb..#Scores','U') IS NOT NULL

    DROP TABLE #Scores

    CREATE TABLE #Scores

    (

    Person VARCHAR(10),

    Grp INT,

    Score INT,

    PRIMARY KEY (Person,Grp)

    )

    --===== Populate the test table with test data from the post (NOT PART OF THE SOLUTION)

    INSERT INTO #Scores

    (Person, Grp, Score)

    SELECT 'Person1',1, 88 UNION ALL

    SELECT 'Person2',1, 97 UNION ALL

    SELECT 'Person3',1,100 UNION ALL

    SELECT 'Person4',2, 48 UNION ALL

    SELECT 'Person5',2, 50 UNION ALL

    SELECT 'Person6',2, 30 UNION ALL

    SELECT 'Person7',2, 50

    --===== Demo the "cursorless" solution

    -- Note that the "+0.0" in (r.Rank+0.0)/r.PeopleInGroup is an alternate method

    -- for converting the two integers involved to FLOAT without explicitly making

    -- the conversion... More of a personal preferance than anything else.

    -- Think of "Percentile" as "this person is in the top x.xx% of his/her class".

    SELECT Person,

    Grp,

    Score,

    r.Rank,

    Percentile = CAST((r.Rank+0.0)/r.PeopleInGroup AS DECIMAL(3,2))

    FROM (--==== Find rank and # of people in group so can find percentile above

    SELECT Person,

    Grp,

    Score,

    Rank = (--==== Creates running count using triangular join.

    -- Groups are small so triangular join won't kill us here.

    -- (Correlated subquery)

    SELECT COUNT(*)+1 --Ties for first become "1"

    FROM #Scores s2

    WHERE s1.Grp = s2.Grp

    AND s1.Score < s2.Score --Ties for first become "1"

    ),

    PeopleInGroup = (--==== Simple count of everyone in group

    -- (Correlated subquery)

    SELECT COUNT(*)

    FROM #Scores s2

    WHERE s1.Grp = s2.Grp

    )

    FROM #Scores s1

    ) r

    ORDER BY Grp, Rank, Person

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • oh..!ok..got it .I used the code here.But am trying to create a function for percentile in sql 2000.

    suppose there are N data points ,assume N =8,in a set, then if i enter a number to find the percentile value, say, 25 then it has to calculate N*25% i.e.,

    (25/100)*8=2 (x=2)

    if the x here is an integer, then it has to take the average of 2nd and 3rd data point and that will be the percentile value

    if the x there is in decimal ,say some thing like 2.8 then it has to

    take the 3rd data point and that ll be the percentile value.

    I dont have inbuilt rank function and row_number function.

    Can you please let me know how can this be done...!!

  • In the example previously given, there are 2 groups... 1 has 3 data points and the other has 4. Would you provide the "8 data points" for the example above so we can help you figure this out?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Having read this blog post recently: http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx, I decided to see if a cursor solution to this problem could be more efficient than the set-based one proposed. I haven't tested it with hundreds or thousands of rows, but I did come up with a cursor that is more efficient at least on reads for a smaller set of rows.

    Populate the table, then run the cursor code and set code separately. On my laptop using Profiler to measure performance, the cursor code came in at 340 reads and 16 ms. The set code came in at 590 reads and 16 ms. Note that I did use a set-based approach for the percentile since it seemed most efficient. My gut feeling is that the larger the number of rows goes the more efficient the cursor will become. I hope someone builds an insert loop to scale it up to verify this. Time for me to start working some billable hours! :w00t:

    I hope the code pastes ok. Haven't done that before.

    /*

    --populate the table

    IF OBJECT_ID('TempDb..#Scores','U') IS NOT NULL

    DROP TABLE dbo.#Scores

    CREATE TABLE dbo.#Scores

    (

    Person VARCHAR(10),

    Grp INT,

    Score INT,

    PRIMARY KEY (Person,Grp)

    )

    --===== Populate the test table with test data from the post (NOT PART OF THE SOLUTION)

    INSERT INTO dbo.#Scores

    (Person, Grp, Score)

    SELECT 'Person1',1, 88 UNION ALL SELECT 'Person2',1, 97 UNION ALL SELECT 'Person3',1,100

    UNION ALL SELECT 'Person4',2, 48 UNION ALL SELECT 'Person5',2, 50 UNION ALL SELECT 'Person6',2, 30

    UNION ALL SELECT 'Person7',2, 50

    UNION ALL

    SELECT 'Person11',11, 88 UNION ALL SELECT 'Person12',11, 97 UNION ALL SELECT 'Person13',11,100

    UNION ALL SELECT 'Person14',12, 48 UNION ALL SELECT 'Person15',12, 50 UNION ALL SELECT 'Person16',12, 30

    UNION ALL SELECT 'Person17',12, 50

    UNION ALL

    SELECT 'Person21',21, 88 UNION ALL SELECT 'Person22',21, 97 UNION ALL SELECT 'Person23',21,100

    UNION ALL SELECT 'Person24',2, 48 UNION ALL SELECT 'Person25',2, 50 UNION ALL SELECT 'Person26',2, 30

    UNION ALL SELECT 'Person27',2, 50

    UNION ALL

    SELECT 'Person31',31, 88 UNION ALL SELECT 'Person32',31, 97 UNION ALL SELECT 'Person33',31,100

    UNION ALL SELECT 'Person34',32, 48 UNION ALL SELECT 'Person35',32, 50 UNION ALL SELECT 'Person36',32, 30

    UNION ALL SELECT 'Person37',32, 50

    UNION ALL

    SELECT 'Person41',41, 88 UNION ALL SELECT 'Person42',41, 97 UNION ALL SELECT 'Person43',41,100

    UNION ALL SELECT 'Person44',42, 48 UNION ALL SELECT 'Person45',42, 50 UNION ALL SELECT 'Person46',42, 30

    UNION ALL SELECT 'Person47',42, 50

    UNION ALL

    SELECT 'Person51',51, 88 UNION ALL SELECT 'Person52',51, 97 UNION ALL SELECT 'Person53',51,100

    UNION ALL SELECT 'Person54',51, 48 UNION ALL SELECT 'Person55',51, 50 UNION ALL SELECT 'Person56',51, 30

    UNION ALL SELECT 'Person57',51, 50

    UNION ALL

    SELECT 'Person61',51, 88 UNION ALL SELECT 'Person62',51, 97 UNION ALL SELECT 'Person63',51,100

    UNION ALL SELECT 'Person64',51, 48 UNION ALL SELECT 'Person65',51, 50 UNION ALL SELECT 'Person66',51, 30

    UNION ALL SELECT 'Person67',51, 50

    */

    set nocount on

    declare @person varchar(10), @grp int, @score int, @prevgrp int,

    @rank int, @prevscore int

    select @rank = 0, @prevscore = -9999999, @prevgrp = -9999999

    declare @ranking table (person varchar(10), grp int, score int, rank int)

    DECLARE workCursor CURSOR fast_forward for --FORWARD_ONLY READ_ONLY STATIC FOR

    SELECT person, grp, score

    FROM dbo.#scores

    ORDER BY grp asc, score desc

    OPEN workCursor

    FETCH NEXT FROM workCursor INTO @person, @grp, @score

    WHILE (@@FETCH_STATUS = 0)

    BEGIN

    IF @grp = @prevgrp

    BEGIN

    IF @score <> @prevscore

    BEGIN

    SET @rank = @rank + 1

    INSERT @ranking values (@person, @grp, @score, @rank)

    END

    ELSE

    BEGIN

    INSERT @ranking values (@person, @grp, @score, @rank)

    END

    SET @prevscore = @score

    END --end of grpflag = 0

    ELSE

    BEGIN

    SET @rank = 1

    INSERT @ranking values (@person, @grp, @score, @rank)

    SELECT @prevgrp = @grp, @prevscore = @score

    END

    FETCH NEXT FROM workCursor INTO @person, @grp, @score

    END

    CLOSE workCursor

    DEALLOCATE workCursor

    select person, r.grp, score, rank, cast((r.rank + 0.00)/t.reccount as decimal(3,2)) as Percentile

    from @ranking r inner join (select grp, count(*) as reccount from @ranking group by grp) as t on r.grp = t.grp

    --run this by itself after running the cursor

    SELECT Person,

    Grp,

    Score,

    r.Rank,

    Percentile = CAST((r.Rank+0.0)/r.PeopleInGroup AS DECIMAL(3,2))

    FROM (--==== Find rank and # of people in group so can find percentile above

    SELECT Person,

    Grp,

    Score,

    Rank = (--==== Creates running count using triangular join.

    -- Groups are small so triangular join won't kill us here.

    -- (Correlated subquery)

    SELECT COUNT(*)+1

    --Ties for first become "1"

    FROM dbo.#Scores s2

    WHERE s1.Grp = s2.Grp

    AND s1.Score < s2.Score

    --Ties for first become "1"

    ),

    PeopleInGroup = (--==== Simple count of everyone in group

    -- (Correlated subquery)

    SELECT COUNT(*)

    FROM dbo.#Scores s2

    WHERE s1.Grp = s2.Grp

    )

    FROM dbo.#Scores s1

    ) r

    ORDER BY Grp, Rank, Person

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Heh... real problem is that Adam and a whole bunch of other folks think that what they've posted as a solution (including the one I "documentd"), are "Set Based". They, in fact, are not... Triangular Joins as what everyone has used, so far, are NOT set based... they violate the first rule of "touching each row just once".

    Cursor and/or While loop will beat the hell out of triangular joins. But neither is required. I'm on vacation, right now, but if I have the time later, I'll see if I can write a set based demo that should blow the socks off the various "loop" solutions.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply