Can this be done without cursors?

Question

Can this be done without cursors?

ikorenbl

SSCommitted

Points: 1750
More actions
July 1, 2004 at 11:37 am

#161665

I have a ranking problem. I need to calculate rank and percentile for each group of people. Here is a sample data. The task is to rank people within each group based on result. Percentile is a person's rank divided by number of people in a group.
person    group   result
person1   1    88
person2   1    97
person3   1    100
person4   2    48
person5   2    50
person6   2    30
person7   2    50
The result should be like the following.
Notice that if people are tied with the same score, they get the best rank and the next person gets a rank that account for number of people tied with the same score (see persons4's rank and percentile).
person    group   result rank percentile
person1   1    88    3     .1
person2   1    97    2     .66
person3   1    100   1    .33
person4   2    48    3     .75
person5   2    50    1     .25
person6   2    30    4      .1
person7   2    50    1     .25
I came up with the solution that uses a cursor - for each group. Can this be done without using cursors at all? Temp tables are ok to use.

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

David.Poole SSC Guru Points: 76039 More actions · Answer 1

Personally, I would do this sort of calculation at the client front end rather than the database back end but return the records in reverse result order.

I played around with this at the server end and found that a set based solution worked out rather expensive.

Basically, I created a table with the 3 existing fields, plus rank, percentile and an identity column.

I populated the table with my source records sorted in reverse order of result so the ID field acts a bit like a rank.

Then I found the minimum ID field for each score and populated all rank columns with identical scores with that minimum ID value.

I then created a looping routine using the rank field to update all records and set the rank to the next sequential number.

By this time the processing time had exceeded the time taken for a cursor to do the job and I hadn't started on the percentile bit yet

LinkedIn Profile

noeld SSC Guru Points: 96590 More actions · Answer 2

In my opinion if you are going to do this server side you will be better of using one cursor than any other technique.

Steps would be:

1 create a temp table for final percentiles calculation.

2. create a select cursor ordered by Group AND result

3. use a varible to check the group change and reset ranking

4. for each pass increment rank if greater than previous and insert in temp table (restet when group changes)

5. compute percentiles set based on the temp table.

I agree with David in the sense that this is one task in which client side shines but if you must do it on the server at least now you have some pointers in what I consider the right direction

HTH

* Noel

Jeswanth Jaishanker SSCommitted Points: 1869 More actions · Answer 3

Have you tried table Variables in SQL Server ? Try to use that instead of Cursor. it will improve the performace in comparison to Cursor.

--------------------------------

rockmoose SSCarpal Tunnel Points: 4601 More actions · Answer 4

-- No cursors please

set nocount on

create table #scores( person char(10) not null, grp int not null, result int not null,

primary key(person,grp) )

insert #scores(person,grp,result)

select 'person1', 1, 88

union select 'person2',1,97

union select 'person3',1,100

union select 'person4',2,48

union select 'person5',2,50

union select 'person6',2,30

union select 'person7',2,50

select

person,

grp,

result,

rank,

percentile = convert(numeric(4,2), cast(rank as float) / people_in_grp )

from

( select

person,

grp,

result,

( select count(*) + 1 from #scores s2

where s1.grp = s2.grp

and s1.result < s2.result ) as rank,

( select count(*) from #scores s2

where s1.grp = s2.grp ) as people_in_grp

from

#scores s1 ) rankings

order by

person,

grp,

result

drop table #scores

set nocount off

--/rockmoose

You must unlearn what You have learnt

ikorenbl SSCommitted Points: 1750 More actions · Answer 5

rockmoose,

It's an excellent solution. I tested it and it seems to be working.

It took me a little while to figure out what (select count(*) + 1 from #scores s2 where s1.grp = s2.grp and s1.result < s2.result) part is doing.

Thank you

rockmoose SSCarpal Tunnel Points: 4601 More actions · Answer 6

You are welcome ikorenbl!

Yeah the (select count(*) + 1 from #scores s2 where s1.grp = s2.grp and s1.result < s2.result) is a bit tricky.

One might be tempted to believe that (select count(*) from #scores s2 where s1.grp = s2.grp and s1.result <= s2.result)

would be logically equivalent.

But then the count will be wrong in those cases where there are several persons with the same score in the same group.

You must unlearn what you have learnt,

/rockmoose

You must unlearn what You have learnt

ajethaa SSC Rookie Points: 36 More actions · Answer 7

hi,

am new to programming and am doing a project now with asp.net technology..am in need of percentile function to be created...

and saw your post...

but could not understand completely.

wats convert numeric(4,2)

please let me know at the earliest as it is very urgent...!

Jeff Moden SSC Guru Points: 1004448 More actions · Answer 8

-- No cursors please
/rockmoose
You must unlearn what You have learnt

Rockmoose! You ol' dog! Where in the heck have you been? How funny... I quoted your tagline just the other day... couldn't remember where I got it from... now, I remember 😀

Good to "see" you, again.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004448 More actions · Answer 9

ajethaa (11/24/2007)
hi,
am new to programming and am doing a project now with asp.net technology..am in need of percentile function to be created...
and saw your post...
but could not understand completely.
wats convert numeric(4,2)
please let me know at the earliest as it is very urgent...!

"Numeric(4,2)" could also be "Decimal(4,2)"... Rockmoose is just formatting the output to have a max of two digits on either side of the decimal point.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004448 More actions · Answer 10

The code RockMoose wrote for solving the Rank/Percentile problem in SQL Server 2000 is a great example of "How to Avoid the Cursor/Loop". Cursor/Loop solution could not be used in a view, if needed... RockMoose' code can be.

Heh... it's such an easy to understand example that I added my own formatting "style" to it and added it to my "Libary of SQL Tricks". Thought I'd share it with everyone... Thanks, RockMoose... good to see you again.

/**********************************************************************************************

Purpose:

Demonstrate how to calculate "Rank" and "Percentile" in SQL Server 2000 without using a

Cursor or While Loop. Since the code has neither of those nor uses a temp table to get the

solution, the solution code could be used in a view.

Author: RockMoose at SQLServerCentral.com

Post: http://www.sqlservercentral.com/Forums/Topic124314-8-1.aspx#bm124434

Date: 24 Nov 2007

**********************************************************************************************/

--===== Setup environment

SET NOCOUNT ON --Supress auto-display of rowcounts for appearance & speed

--===== Create the temporary test table (NOT PART OF THE SOLUTION)

IF OBJECT_ID('TempDb..#Scores','U') IS NOT NULL

DROP TABLE #Scores

CREATE TABLE #Scores

(

Person VARCHAR(10),

Grp INT,

Score INT,

PRIMARY KEY (Person,Grp)

)

--===== Populate the test table with test data from the post (NOT PART OF THE SOLUTION)

INSERT INTO #Scores

(Person, Grp, Score)

SELECT 'Person1',1, 88 UNION ALL

SELECT 'Person2',1, 97 UNION ALL

SELECT 'Person3',1,100 UNION ALL

SELECT 'Person4',2, 48 UNION ALL

SELECT 'Person5',2, 50 UNION ALL

SELECT 'Person6',2, 30 UNION ALL

SELECT 'Person7',2, 50

--===== Demo the "cursorless" solution

-- Note that the "+0.0" in (r.Rank+0.0)/r.PeopleInGroup is an alternate method

-- for converting the two integers involved to FLOAT without explicitly making

-- the conversion... More of a personal preferance than anything else.

-- Think of "Percentile" as "this person is in the top x.xx% of his/her class".

SELECT Person,

Grp,

Score,

r.Rank,

Percentile = CAST((r.Rank+0.0)/r.PeopleInGroup AS DECIMAL(3,2))

FROM (--==== Find rank and # of people in group so can find percentile above

SELECT Person,

Grp,

Score,

Rank = (--==== Creates running count using triangular join.

-- Groups are small so triangular join won't kill us here.

-- (Correlated subquery)

SELECT COUNT(*)+1 --Ties for first become "1"

FROM #Scores s2

WHERE s1.Grp = s2.Grp

AND s1.Score < s2.Score --Ties for first become "1"

),

PeopleInGroup = (--==== Simple count of everyone in group

-- (Correlated subquery)

SELECT COUNT(*)

FROM #Scores s2

WHERE s1.Grp = s2.Grp

)

FROM #Scores s1

) r

ORDER BY Grp, Rank, Person

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ajethaa SSC Rookie Points: 36 More actions · Answer 11

oh..!ok..got it .I used the code here.But am trying to create a function for percentile in sql 2000.

suppose there are N data points ,assume N =8,in a set, then if i enter a number to find the percentile value, say, 25 then it has to calculate N*25% i.e.,

(25/100)*8=2 (x=2)

if the x here is an integer, then it has to take the average of 2nd and 3rd data point and that will be the percentile value

if the x there is in decimal ,say some thing like 2.8 then it has to

take the 3rd data point and that ll be the percentile value.

I dont have inbuilt rank function and row_number function.

Can you please let me know how can this be done...!!

Jeff Moden SSC Guru Points: 1004448 More actions · Answer 12

In the example previously given, there are 2 groups... 1 has 3 data points and the other has 4. Would you provide the "8 data points" for the example above so we can help you figure this out?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

Having read this blog post recently: http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx, I decided to see if a cursor solution to this problem could be more efficient than the set-based one proposed. I haven't tested it with hundreds or thousands of rows, but I did come up with a cursor that is more efficient at least on reads for a smaller set of rows.

Populate the table, then run the cursor code and set code separately. On my laptop using Profiler to measure performance, the cursor code came in at 340 reads and 16 ms. The set code came in at 590 reads and 16 ms. Note that I did use a set-based approach for the percentile since it seemed most efficient. My gut feeling is that the larger the number of rows goes the more efficient the cursor will become. I hope someone builds an insert loop to scale it up to verify this. Time for me to start working some billable hours! :w00t:

I hope the code pastes ok. Haven't done that before.

/*

--populate the table

IF OBJECT_ID('TempDb..#Scores','U') IS NOT NULL

DROP TABLE dbo.#Scores

CREATE TABLE dbo.#Scores

(

Person VARCHAR(10),

Grp INT,

Score INT,

PRIMARY KEY (Person,Grp)

)

--===== Populate the test table with test data from the post (NOT PART OF THE SOLUTION)

INSERT INTO dbo.#Scores

(Person, Grp, Score)

SELECT 'Person1',1, 88 UNION ALL SELECT 'Person2',1, 97 UNION ALL SELECT 'Person3',1,100

UNION ALL SELECT 'Person4',2, 48 UNION ALL SELECT 'Person5',2, 50 UNION ALL SELECT 'Person6',2, 30

UNION ALL SELECT 'Person7',2, 50

UNION ALL

SELECT 'Person11',11, 88 UNION ALL SELECT 'Person12',11, 97 UNION ALL SELECT 'Person13',11,100

UNION ALL SELECT 'Person14',12, 48 UNION ALL SELECT 'Person15',12, 50 UNION ALL SELECT 'Person16',12, 30

UNION ALL SELECT 'Person17',12, 50

UNION ALL

SELECT 'Person21',21, 88 UNION ALL SELECT 'Person22',21, 97 UNION ALL SELECT 'Person23',21,100

UNION ALL SELECT 'Person24',2, 48 UNION ALL SELECT 'Person25',2, 50 UNION ALL SELECT 'Person26',2, 30

UNION ALL SELECT 'Person27',2, 50

UNION ALL

SELECT 'Person31',31, 88 UNION ALL SELECT 'Person32',31, 97 UNION ALL SELECT 'Person33',31,100

UNION ALL SELECT 'Person34',32, 48 UNION ALL SELECT 'Person35',32, 50 UNION ALL SELECT 'Person36',32, 30

UNION ALL SELECT 'Person37',32, 50

UNION ALL

SELECT 'Person41',41, 88 UNION ALL SELECT 'Person42',41, 97 UNION ALL SELECT 'Person43',41,100

UNION ALL SELECT 'Person44',42, 48 UNION ALL SELECT 'Person45',42, 50 UNION ALL SELECT 'Person46',42, 30

UNION ALL SELECT 'Person47',42, 50

UNION ALL

SELECT 'Person51',51, 88 UNION ALL SELECT 'Person52',51, 97 UNION ALL SELECT 'Person53',51,100

UNION ALL SELECT 'Person54',51, 48 UNION ALL SELECT 'Person55',51, 50 UNION ALL SELECT 'Person56',51, 30

UNION ALL SELECT 'Person57',51, 50

UNION ALL

SELECT 'Person61',51, 88 UNION ALL SELECT 'Person62',51, 97 UNION ALL SELECT 'Person63',51,100

UNION ALL SELECT 'Person64',51, 48 UNION ALL SELECT 'Person65',51, 50 UNION ALL SELECT 'Person66',51, 30

UNION ALL SELECT 'Person67',51, 50

*/

set nocount on

declare @person varchar(10), @grp int, @score int, @prevgrp int,

@rank int, @prevscore int

select @rank = 0, @prevscore = -9999999, @prevgrp = -9999999

declare @ranking table (person varchar(10), grp int, score int, rank int)

DECLARE workCursor CURSOR fast_forward for --FORWARD_ONLY READ_ONLY STATIC FOR

SELECT person, grp, score

FROM dbo.#scores

ORDER BY grp asc, score desc

OPEN workCursor

FETCH NEXT FROM workCursor INTO @person, @grp, @score

WHILE (@@FETCH_STATUS = 0)

BEGIN

IF @grp = @prevgrp

BEGIN

IF @score <> @prevscore

BEGIN

SET @rank = @rank + 1

INSERT @ranking values (@person, @grp, @score, @rank)

END

ELSE

BEGIN

INSERT @ranking values (@person, @grp, @score, @rank)

END

SET @prevscore = @score

END --end of grpflag = 0

ELSE

BEGIN

SET @rank = 1

INSERT @ranking values (@person, @grp, @score, @rank)

SELECT @prevgrp = @grp, @prevscore = @score

END

FETCH NEXT FROM workCursor INTO @person, @grp, @score

END

CLOSE workCursor

DEALLOCATE workCursor

select person, r.grp, score, rank, cast((r.rank + 0.00)/t.reccount as decimal(3,2)) as Percentile

from @ranking r inner join (select grp, count(*) as reccount from @ranking group by grp) as t on r.grp = t.grp

--run this by itself after running the cursor

SELECT Person,

Grp,

Score,

r.Rank,

Percentile = CAST((r.Rank+0.0)/r.PeopleInGroup AS DECIMAL(3,2))

FROM (--==== Find rank and # of people in group so can find percentile above

SELECT Person,

Grp,

Score,

Rank = (--==== Creates running count using triangular join.

-- Groups are small so triangular join won't kill us here.

-- (Correlated subquery)

SELECT COUNT(*)+1

--Ties for first become "1"

FROM dbo.#Scores s2

WHERE s1.Grp = s2.Grp

AND s1.Score < s2.Score

--Ties for first become "1"

),

PeopleInGroup = (--==== Simple count of everyone in group

-- (Correlated subquery)

SELECT COUNT(*)

FROM dbo.#Scores s2

WHERE s1.Grp = s2.Grp

)

FROM dbo.#Scores s1

) r

ORDER BY Grp, Rank, Person

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004448 More actions · Answer 14

Heh... real problem is that Adam and a whole bunch of other folks think that what they've posted as a solution (including the one I "documentd"), are "Set Based". They, in fact, are not... Triangular Joins as what everyone has used, so far, are NOT set based... they violate the first rule of "touching each row just once".

Cursor and/or While loop will beat the hell out of triangular joins. But neither is required. I'm on vacation, right now, but if I have the time later, I'll see if I can write a set based demo that should blow the socks off the various "loop" solutions.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)