July 1, 2004 at 11:37 am
I have a ranking problem. I need to calculate rank and percentile for each group of people. Here is a sample data. The task is to rank people within each group based on result. Percentile is a person's rank divided by number of people in a group.
person group result
person1 1 88
person2 1 97
person3 1 100
person4 2 48
person5 2 50
person6 2 30
person7 2 50
The result should be like the following.
Notice that if people are tied with the same score, they get the best rank and the next person gets a rank that account for number of people tied with the same score (see persons4's rank and percentile).
person group result rank percentile
person1 1 88 3 .1
person2 1 97 2 .66
person3 1 100 1 .33
person4 2 48 3 .75
person5 2 50 1 .25
person6 2 30 4 .1
person7 2 50 1 .25
I came up with the solution that uses a cursor - for each group. Can this be done without using cursors at all? Temp tables are ok to use.
July 1, 2004 at 1:08 pm
Personally, I would do this sort of calculation at the client front end rather than the database back end but return the records in reverse result order.
I played around with this at the server end and found that a set based solution worked out rather expensive.
Basically, I created a table with the 3 existing fields, plus rank, percentile and an identity column.
I populated the table with my source records sorted in reverse order of result so the ID field acts a bit like a rank.
Then I found the minimum ID field for each score and populated all rank columns with identical scores with that minimum ID value.
I then created a looping routine using the rank field to update all records and set the rank to the next sequential number.
By this time the processing time had exceeded the time taken for a cursor to do the job and I hadn't started on the percentile bit yet
July 1, 2004 at 7:43 pm
In my opinion if you are going to do this server side you will be better of using one cursor than any other technique.
Steps would be:
1 create a temp table for final percentiles calculation.
2. create a select cursor ordered by Group AND result
3. use a varible to check the group change and reset ranking
4. for each pass increment rank if greater than previous and insert in temp table (restet when group changes)
5. compute percentiles set based on the temp table.
I agree with David in the sense that this is one task in which client side shines but if you must do it on the server at least now you have some pointers in what I consider the right direction
HTH
* Noel
July 2, 2004 at 3:48 am
Have you tried table Variables in SQL Server ? Try to use that instead of Cursor. it will improve the performace in comparison to Cursor.
--------------------------------
July 2, 2004 at 5:06 am
-- No cursors please
set nocount on
create table #scores( person char(10) not null, grp int not null, result int not null,
primary key(person,grp) )
insert #scores(person,grp,result)
select 'person1', 1, 88
union select 'person2',1,97
union select 'person3',1,100
union select 'person4',2,48
union select 'person5',2,50
union select 'person6',2,30
union select 'person7',2,50
select
person,
grp,
result,
rank,
percentile = convert(numeric(4,2), cast(rank as float) / people_in_grp )
from
( select
person,
grp,
result,
( select count(*) + 1 from #scores s2
where s1.grp = s2.grp
and s1.result < s2.result ) as rank,
( select count(*) from #scores s2
where s1.grp = s2.grp ) as people_in_grp
from
#scores s1 ) rankings
order by
person,
grp,
result
drop table #scores
set nocount off
--/rockmoose
You must unlearn what You have learnt
July 2, 2004 at 12:06 pm
rockmoose,
It's an excellent solution. I tested it and it seems to be working.
It took me a little while to figure out what (select count(*) + 1 from #scores s2 where s1.grp = s2.grp and s1.result < s2.result) part is doing.
Thank you
July 3, 2004 at 3:54 am
You are welcome ikorenbl!
Yeah the (select count(*) + 1 from #scores s2 where s1.grp = s2.grp and s1.result < s2.result) is a bit tricky.
One might be tempted to believe that (select count(*) from #scores s2 where s1.grp = s2.grp and s1.result <= s2.result)
would be logically equivalent.
But then the count will be wrong in those cases where there are several persons with the same score in the same group.
You must unlearn what you have learnt,
/rockmoose
You must unlearn what You have learnt
November 24, 2007 at 8:48 am
hi,
am new to programming and am doing a project now with asp.net technology..am in need of percentile function to be created...
and saw your post...
but could not understand completely.
wats convert numeric(4,2)
please let me know at the earliest as it is very urgent...!
November 24, 2007 at 10:10 am
-- No cursors please
/rockmoose
You must unlearn what You have learnt
Rockmoose! You ol' dog! Where in the heck have you been? How funny... I quoted your tagline just the other day... couldn't remember where I got it from... now, I remember 😀
Good to "see" you, again.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 24, 2007 at 10:12 am
ajethaa (11/24/2007)
hi,am new to programming and am doing a project now with asp.net technology..am in need of percentile function to be created...
and saw your post...
but could not understand completely.
wats convert numeric(4,2)
please let me know at the earliest as it is very urgent...!
"Numeric(4,2)" could also be "Decimal(4,2)"... Rockmoose is just formatting the output to have a max of two digits on either side of the decimal point.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 24, 2007 at 10:34 am
The code RockMoose wrote for solving the Rank/Percentile problem in SQL Server 2000 is a great example of "How to Avoid the Cursor/Loop". Cursor/Loop solution could not be used in a view, if needed... RockMoose' code can be.
Heh... it's such an easy to understand example that I added my own formatting "style" to it and added it to my "Libary of SQL Tricks". Thought I'd share it with everyone... Thanks, RockMoose... good to see you again.
/**********************************************************************************************
Purpose:
Demonstrate how to calculate "Rank" and "Percentile" in SQL Server 2000 without using a
Cursor or While Loop. Since the code has neither of those nor uses a temp table to get the
solution, the solution code could be used in a view.
Author: RockMoose at SQLServerCentral.com
Post: http://www.sqlservercentral.com/Forums/Topic124314-8-1.aspx#bm124434
Date: 24 Nov 2007
**********************************************************************************************/
--===== Setup environment
SET NOCOUNT ON --Supress auto-display of rowcounts for appearance & speed
--===== Create the temporary test table (NOT PART OF THE SOLUTION)
IF OBJECT_ID('TempDb..#Scores','U') IS NOT NULL
DROP TABLE #Scores
CREATE TABLE #Scores
(
Person VARCHAR(10),
Grp INT,
Score INT,
PRIMARY KEY (Person,Grp)
)
--===== Populate the test table with test data from the post (NOT PART OF THE SOLUTION)
INSERT INTO #Scores
(Person, Grp, Score)
SELECT 'Person1',1, 88 UNION ALL
SELECT 'Person2',1, 97 UNION ALL
SELECT 'Person3',1,100 UNION ALL
SELECT 'Person4',2, 48 UNION ALL
SELECT 'Person5',2, 50 UNION ALL
SELECT 'Person6',2, 30 UNION ALL
SELECT 'Person7',2, 50
--===== Demo the "cursorless" solution
-- Note that the "+0.0" in (r.Rank+0.0)/r.PeopleInGroup is an alternate method
-- for converting the two integers involved to FLOAT without explicitly making
-- the conversion... More of a personal preferance than anything else.
-- Think of "Percentile" as "this person is in the top x.xx% of his/her class".
SELECT Person,
Grp,
Score,
r.Rank,
Percentile = CAST((r.Rank+0.0)/r.PeopleInGroup AS DECIMAL(3,2))
FROM (--==== Find rank and # of people in group so can find percentile above
SELECT Person,
Grp,
Score,
Rank = (--==== Creates running count using triangular join.
-- Groups are small so triangular join won't kill us here.
-- (Correlated subquery)
SELECT COUNT(*)+1 --Ties for first become "1"
FROM #Scores s2
WHERE s1.Grp = s2.Grp
AND s1.Score < s2.Score --Ties for first become "1"
),
PeopleInGroup = (--==== Simple count of everyone in group
-- (Correlated subquery)
SELECT COUNT(*)
FROM #Scores s2
WHERE s1.Grp = s2.Grp
)
FROM #Scores s1
) r
ORDER BY Grp, Rank, Person
--Jeff Moden
Change is inevitable... Change for the better is not.
November 25, 2007 at 3:00 am
oh..!ok..got it .I used the code here.But am trying to create a function for percentile in sql 2000.
suppose there are N data points ,assume N =8,in a set, then if i enter a number to find the percentile value, say, 25 then it has to calculate N*25% i.e.,
(25/100)*8=2 (x=2)
if the x here is an integer, then it has to take the average of 2nd and 3rd data point and that will be the percentile value
if the x there is in decimal ,say some thing like 2.8 then it has to
take the 3rd data point and that ll be the percentile value.
I dont have inbuilt rank function and row_number function.
Can you please let me know how can this be done...!!
November 25, 2007 at 7:48 am
In the example previously given, there are 2 groups... 1 has 3 data points and the other has 4. Would you provide the "8 data points" for the example above so we can help you figure this out?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 26, 2007 at 10:15 am
Having read this blog post recently: http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-redux.aspx, I decided to see if a cursor solution to this problem could be more efficient than the set-based one proposed. I haven't tested it with hundreds or thousands of rows, but I did come up with a cursor that is more efficient at least on reads for a smaller set of rows.
Populate the table, then run the cursor code and set code separately. On my laptop using Profiler to measure performance, the cursor code came in at 340 reads and 16 ms. The set code came in at 590 reads and 16 ms. Note that I did use a set-based approach for the percentile since it seemed most efficient. My gut feeling is that the larger the number of rows goes the more efficient the cursor will become. I hope someone builds an insert loop to scale it up to verify this. Time for me to start working some billable hours! :w00t:
I hope the code pastes ok. Haven't done that before.
/*
--populate the table
IF OBJECT_ID('TempDb..#Scores','U') IS NOT NULL
DROP TABLE dbo.#Scores
CREATE TABLE dbo.#Scores
(
Person VARCHAR(10),
Grp INT,
Score INT,
PRIMARY KEY (Person,Grp)
)
--===== Populate the test table with test data from the post (NOT PART OF THE SOLUTION)
INSERT INTO dbo.#Scores
(Person, Grp, Score)
SELECT 'Person1',1, 88 UNION ALL SELECT 'Person2',1, 97 UNION ALL SELECT 'Person3',1,100
UNION ALL SELECT 'Person4',2, 48 UNION ALL SELECT 'Person5',2, 50 UNION ALL SELECT 'Person6',2, 30
UNION ALL SELECT 'Person7',2, 50
UNION ALL
SELECT 'Person11',11, 88 UNION ALL SELECT 'Person12',11, 97 UNION ALL SELECT 'Person13',11,100
UNION ALL SELECT 'Person14',12, 48 UNION ALL SELECT 'Person15',12, 50 UNION ALL SELECT 'Person16',12, 30
UNION ALL SELECT 'Person17',12, 50
UNION ALL
SELECT 'Person21',21, 88 UNION ALL SELECT 'Person22',21, 97 UNION ALL SELECT 'Person23',21,100
UNION ALL SELECT 'Person24',2, 48 UNION ALL SELECT 'Person25',2, 50 UNION ALL SELECT 'Person26',2, 30
UNION ALL SELECT 'Person27',2, 50
UNION ALL
SELECT 'Person31',31, 88 UNION ALL SELECT 'Person32',31, 97 UNION ALL SELECT 'Person33',31,100
UNION ALL SELECT 'Person34',32, 48 UNION ALL SELECT 'Person35',32, 50 UNION ALL SELECT 'Person36',32, 30
UNION ALL SELECT 'Person37',32, 50
UNION ALL
SELECT 'Person41',41, 88 UNION ALL SELECT 'Person42',41, 97 UNION ALL SELECT 'Person43',41,100
UNION ALL SELECT 'Person44',42, 48 UNION ALL SELECT 'Person45',42, 50 UNION ALL SELECT 'Person46',42, 30
UNION ALL SELECT 'Person47',42, 50
UNION ALL
SELECT 'Person51',51, 88 UNION ALL SELECT 'Person52',51, 97 UNION ALL SELECT 'Person53',51,100
UNION ALL SELECT 'Person54',51, 48 UNION ALL SELECT 'Person55',51, 50 UNION ALL SELECT 'Person56',51, 30
UNION ALL SELECT 'Person57',51, 50
UNION ALL
SELECT 'Person61',51, 88 UNION ALL SELECT 'Person62',51, 97 UNION ALL SELECT 'Person63',51,100
UNION ALL SELECT 'Person64',51, 48 UNION ALL SELECT 'Person65',51, 50 UNION ALL SELECT 'Person66',51, 30
UNION ALL SELECT 'Person67',51, 50
*/
set nocount on
declare @person varchar(10), @grp int, @score int, @prevgrp int,
@rank int, @prevscore int
select @rank = 0, @prevscore = -9999999, @prevgrp = -9999999
declare @ranking table (person varchar(10), grp int, score int, rank int)
DECLARE workCursor CURSOR fast_forward for --FORWARD_ONLY READ_ONLY STATIC FOR
SELECT person, grp, score
FROM dbo.#scores
ORDER BY grp asc, score desc
OPEN workCursor
FETCH NEXT FROM workCursor INTO @person, @grp, @score
WHILE (@@FETCH_STATUS = 0)
BEGIN
IF @grp = @prevgrp
BEGIN
IF @score <> @prevscore
BEGIN
SET @rank = @rank + 1
INSERT @ranking values (@person, @grp, @score, @rank)
END
ELSE
BEGIN
INSERT @ranking values (@person, @grp, @score, @rank)
END
SET @prevscore = @score
END --end of grpflag = 0
ELSE
BEGIN
SET @rank = 1
INSERT @ranking values (@person, @grp, @score, @rank)
SELECT @prevgrp = @grp, @prevscore = @score
END
FETCH NEXT FROM workCursor INTO @person, @grp, @score
END
CLOSE workCursor
DEALLOCATE workCursor
select person, r.grp, score, rank, cast((r.rank + 0.00)/t.reccount as decimal(3,2)) as Percentile
from @ranking r inner join (select grp, count(*) as reccount from @ranking group by grp) as t on r.grp = t.grp
--run this by itself after running the cursor
SELECT Person,
Grp,
Score,
r.Rank,
Percentile = CAST((r.Rank+0.0)/r.PeopleInGroup AS DECIMAL(3,2))
FROM (--==== Find rank and # of people in group so can find percentile above
SELECT Person,
Grp,
Score,
Rank = (--==== Creates running count using triangular join.
-- Groups are small so triangular join won't kill us here.
-- (Correlated subquery)
SELECT COUNT(*)+1
--Ties for first become "1"
FROM dbo.#Scores s2
WHERE s1.Grp = s2.Grp
AND s1.Score < s2.Score
--Ties for first become "1"
),
PeopleInGroup = (--==== Simple count of everyone in group
-- (Correlated subquery)
SELECT COUNT(*)
FROM dbo.#Scores s2
WHERE s1.Grp = s2.Grp
)
FROM dbo.#Scores s1
) r
ORDER BY Grp, Rank, Person
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 26, 2007 at 10:44 am
Heh... real problem is that Adam and a whole bunch of other folks think that what they've posted as a solution (including the one I "documentd"), are "Set Based". They, in fact, are not... Triangular Joins as what everyone has used, so far, are NOT set based... they violate the first rule of "touching each row just once".
Cursor and/or While loop will beat the hell out of triangular joins. But neither is required. I'm on vacation, right now, but if I have the time later, I'll see if I can write a set based demo that should blow the socks off the various "loop" solutions.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply