September 12, 2011 at 2:43 pm
Does anyone have any way of calculating the percentile of a column of of data.
Ideally I need to show the 95th percentile.
I need to use the result in a NHS dataset report via Reporting Services.
As per my previuos topic i have used the excel function to see what result i should get. from the example data below i should see that the 95th percentile would be 754.25.
I have seen some examples elsewhere on the web but they seem to use UNION SELECT but I am unsure of how to apply it.
Any help wouuld be appreciated
Many Thanks
CREATE TABLE PS_TestForOnline
(
number NVARCHAR (20),
);
INSERT INTO PS_TestForOnline
VALUES('1' );
INSERT INTO PS_TestForOnline
VALUES('5' );
INSERT INTO PS_TestForOnline
VALUES('10' );
INSERT INTO PS_TestForOnline
VALUES('15' );
INSERT INTO PS_TestForOnline
VALUES('20' );
INSERT INTO PS_TestForOnline
VALUES('999' );
SELECT * FROM PS_TestForOnline
DROP TABLE PS_TestForOnline
September 12, 2011 at 2:46 pm
I'm not sure of how your calculation would work. Can you help explain what the '95th' percentile is? I can't seem to easily grok the math that came up with the number you provided.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 12, 2011 at 2:48 pm
Evil Kraig F (9/12/2011)
I'm not sure of how your calculation would work. Can you help explain what the '95th' percentile is? I can't seem to easily grok the math that came up with the number you provided.
My thoughts ed zachary.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 12, 2011 at 3:07 pm
cribbed this from another web page but feel that it explains it fairly well.
"Basically the 95th percentile says that 95% of the time, the usage is below this amount. Conversely of course, 5% of the time, usage is above that amount. The 95th percentile is a good number to use for planning so you can ensure you have the needed bandwidth at least 95% of the time."
The data that I will eventually apply this to is the time patients wait in our A&E dept. Of the minutes that people wait to be treated and seen I need to know that 95 % of the time patients are treated in X amount of time.
Thanks
September 12, 2011 at 3:11 pm
OK so how did you come up with 754.25 from your sample data? It would seem that with your data 95% of them are no greater than 20. If you can explain how you came up with 754.25 it should be pretty simple.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 12, 2011 at 3:38 pm
OK, perhaps my sample data wasnt the best example I could have used. I found this which i think may help.
"The algorithm is as follows: collect all the data samples for a period of time (commonly a day, a week, or a month), then sort the data set by value from highest to lowest and discard the highest 5% of the sorted samples. The next highest sample is the 95th percentile value for the data set. "
From what I've been told at work and from what i understand, you sort your data set and remove the top 5% of rows. If your dataset consisted of 100 rows sorted in order, your 95th percentile would be the value on the 96th row.
Sorry to be so vague. I will confirm the description at work tomorrow.
Thanks
September 12, 2011 at 3:44 pm
Well then....
select MAX(number) from
(
select top 95 percent number from PS_TestForOnline order by number
) MyAlias
You probably want a bit more sample data to prove this but it should be what you are looking for.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 12, 2011 at 3:48 pm
Here is a quick and dirty proof.
CREATE TABLE PS_TestForOnline
(
number int identity,
);
INSERT INTO PS_TestForOnline default values
go 100
select * from PS_TestForOnline
select MAX(number) from
(
select top 95 percent number from PS_TestForOnline order by number
) MyAlias
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 12, 2011 at 6:09 pm
I don't claim to be an expert in either SQL or Excel but I tested the method shown above with the original data values given and that method does not produce the correct answer. It would only produce the correct answer with the values 1,2,3,...99,100.
We need to understand what the term "percentile" means in Excel. I found the following on YouTube which provides a good explanation.
http://www.youtube.com/watch?v=XllEMSjVGGk
Using the method described on YouTube, I came up with the following SQL that uses the values given. I'm sure that I am not using the best SQL methods, but at least it gives the correct answer of 754.25. I hope someone else here can improve the code.
CREATE TABLE PS_TestForOnline
(
rowId int identity,
minutes int,
);
INSERT INTO PS_TestForOnline
VALUES(1);
INSERT INTO PS_TestForOnline
VALUES(5);
INSERT INTO PS_TestForOnline
VALUES(10);
INSERT INTO PS_TestForOnline
VALUES(15);
INSERT INTO PS_TestForOnline
VALUES(20);
INSERT INTO PS_TestForOnline
VALUES(999);
SELECT * FROM PS_TestForOnline
DECLARE @PERCENT REAL
SET @PERCENT = .95
PRINT @PERCENT
DECLARE @COUNT INT
SELECT @COUNT=COUNT(*) FROM PS_TestForOnline
PRINT @COUNT
DECLARE @POSITION REAL
DECLARE @BETWEEN REAL
DECLARE @ROW1 INT
DECLARE @ROW2 INT
SET @POSITION = @PERCENT * @COUNT + (1 - @PERCENT)
PRINT @POSITION
SET @ROW1 = @POSITION
SET @ROW2 = @ROW1 + 1
PRINT @ROW1
PRINT @ROW2
SET @BETWEEN = @POSITION - @ROW1
PRINT @BETWEEN
DECLARE @MINUTES1 INT
DECLARE @MINUTES2 INT
SELECT @MINUTES1 = minutes FROM PS_TestForOnline WHERE rowId = @ROW1
PRINT @MINUTES1
SELECT @MINUTES2 = minutes FROM PS_TestForOnline WHERE rowId = @ROW2
PRINT @MINUTES2
DECLARE @DISTANCE INT
SET @DISTANCE = @MINUTES2 - @MINUTES1
PRINT @DISTANCE
DECLARE @add REAL
SET @add = @BETWEEN * @DISTANCE
PRINT @add
DECLARE @PERCENTILE REAL
SET @PERCENTILE = @MINUTES1 + @add
PRINT @PERCENTILE
DROP TABLE PS_TestForOnline
Selected values from table:
rowId minutes
11
25
310
415
520
6999
Printed values:
0.95 (@PERCENT)
6 (@COUNT)
5.75 (@POSITION)
5 (@ROW1)
6 (@ROW2)
0.75 (@BETWEEN)
20 (@MINUTES1)
999 (@MINUTES2)
979 (@DISTANCE)
734.25 (@ADD)
754.25 (@PERCENTILE) This is the desired correct answer.
gmrose
September 12, 2011 at 6:59 pm
gmrose (9/12/2011)
I don't claim to be an expert in either SQL or Excel but I tested the method shown above with the original data values given and that method does not produce the correct answer. It would only produce the correct answer with the values 1,2,3,...99,100.
That really depends on what the "correct" answer is. It is accurate if as the OP stated they wanted to remove the top 5% of the values. I guess we will find out tomorrow what the definition is. 😉
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 12, 2011 at 7:18 pm
He's doing a statistical analysis based on a normal distribution. he's looking for the point in a normal distibution with an average and standard deviation commensurate with his set of numbers where 95% of the distribution is less than Z.
In basic stats (I'm sure tom will be by and whip this sucker out a lot better), this approximates to:
x < Avg(n) + 1.641 * stdev(n)
In this case avg(n) =175 and stdev(n) = 403.733, which leads t something like:
x < 838 and change
Of course - this is fairly arbitrary, since the example he shows doesn't look like a normal distribution at all (at least not with the numbers he gave). a 5-number distribution with that kind of wild spread is nothing except undefined.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
September 12, 2011 at 8:00 pm
Matt Miller (#4) (9/12/2011)
He's doing a statistical analysis based on a normal distribution. he's looking for the point in a normal distibution with an average and standard deviation commensurate with his set of numbers where 95% of the distribution is less than Z.In basic stats (I'm sure tom will be by and whip this sucker out a lot better), this approximates to:
x < Avg(n) + 1.41 * stdev(n)
In this case avg(n) =175 and stdev(n) = 403.733, which leads t something like:
x < 744 and change
Of course - this is fairly arbitrary, since the example he shows doesn't look like a normal distribution at all (at least not with the numbers he gave). a 5-number distribution with that kind of wild spread is nothing except undefined.
Well, thank you for clarifying. I think I'll leave this to the folks who got past Basic Statistics (or remember more then I do). If a math formula swings past here I can grok, I'll help with the code though!
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 13, 2011 at 2:50 am
Thank you all for your replies.
I now have the definition from which i am expected to work to.
"The 95th pecentile is taken as the value at position "0.95 x(number of rows -1) +1"
My interpretation is that when the data is ordered I am looking for the value that is next following the 95th % row.
so if we have a random selection of 100 rows sorted acending then the value in row 96 is what I would expect to be the 95th percentile.
Hope this helps and sorry for the original confusion.
September 13, 2011 at 3:29 am
Try this:
Based on the definition you gave, you can enter the percentile in the declare bit, or hard code it instead of the @percentile instead if you are always going to use 0.95 - let me know if it works!
IF OBJECT_ID('tempdb..#DATA') IS NOT NULL DROP TABLE #data
CREATE TABLE #data (number INT)
INSERT INTO #data SELECT 15 as number union all
SELECT 20 as number union all
SELECT 35 as number union all
SELECT 40 as number union all
SELECT 50 as number
DECLARE @percentile DECIMAL(18,3)
SET @Percentile = 0.40
;with percentile as
(
select
CAST(number AS DECIMAL(18,3)) as number
,row_number() OVER (PARTITION BY (SELECT NULL) order by number asc) as row
from #data
)
,percentile2 AS
(
SELECT
number
,row
,MAX(row) OVER (PARTITION BY (SELECT NULL)) as max_row
,(@percentile * (MAX(row) OVER (PARTITION BY (SELECT NULL)) - 1)) + 1 as percentile_row
FROM percentile
)
select
AVG(number) as percentile_value
from percentile2
WHERE ABS(percentile_row - row) < 1
September 13, 2011 at 8:00 am
Thanks to Davin21 (and everyone elses comments) That seems to work fine.
I need now to add group the result by a month and/or field.
I am a bit unsure as to how to incorporate your script into a group script.
Could anyone please advise?
thanks in advance
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply