April 20, 2015 at 10:14 am
Hi,
I am trying to calculate percentile in SQL server exactly similar to MS-excel Percentile function where we give data set and percentile and gives threshold
I am trying to get different thresholds from data set at 0.2, 0.4, 0.6, 0.8
e.g.
I have data set
12000
6000
4000
15000
8000
4000
2000
9000
5000
3000
with excel function =PERCENTILE(A1:A10,0.2) I get result 3800
with excel function =PERCENTILE(A1:A10,0.4) I get result 4600
with excel function =PERCENTILE(A1:A10,0.6) I get result 6800
with excel function =PERCENTILE(A1:A10,0.8) I get result 9600
Please let me know if any function or any script can help to get same result
Thanks
April 20, 2015 at 10:35 am
Hi and welcome to the forums. In order to help we will need a few things:
1. Sample DDL in the form of CREATE TABLE statements
2. Sample data in the form of INSERT INTO statements
3. Expected results based on the sample data
Please take a few minutes and read the first article in my signature for best practices when posting questions.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 20, 2015 at 10:45 am
Hi
please find table with sample data
IF OBJECT_ID( 'tempdb..#temp_quintiles', 'U' ) IS NOT NULL
DROP TABLE #temp_quintiles;
Create table #temp_quintiles(
incomes int NULL);
insert into #temp_quintiles (incomes)
values
(12000),
(6000),
(4000),
(15000),
(8000),
(4000),
(2000),
(9000),
(5000),
(3000)
expected out put to get 0.2 percentile equivalent to excel percentile function 0.2 percentile result 3800
0.4 percentile result 4600
0.6 percentile result 6800
Thanks
April 20, 2015 at 6:50 pm
In SQL 2012 there is a function PERCENTILE_CONT that will do this for you.
At PERCENTILE_CONT(0.5) this is the median. Which can be calculated like this:
An Even Faster Method of Calculating the Median on a Partitioned Heap[/url]
I have played around with creating a replica of PERCENTILE_CONT on earlier versions of SQL Server. Median is difficult enough, but supporting the whole gamut of parameters you can pass as the percentile is quite the challenge.
Before I attempt to rise to that challenge (never got it fully working), I wanted to make sure you're not already on SQL 2012 and posting to the wrong forum.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 21, 2015 at 2:12 am
I am working on sql server 2008, can anyone help me with the logic please 🙂
April 21, 2015 at 2:20 am
akash_singh (4/21/2015)
I am working on sql server 2008, can anyone help me with the logic please 🙂
I was afraid you were going to say that.
One of the issues I encountered while attempting to replicate the formula is that I had no documentation describing the calculation. Is that described anywhere in the Excel documentation on line? Send a link if it is.
One issue with the sample data you provided is that it is too simplistic. What happens if the row count is not 10? That's where it really starts getting convoluted. I was trying to back into a formula, but was unable to do so.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 21, 2015 at 3:56 am
There is a Definition of the Microsoft Excel method for its PERCENTILE function on the linked Wiki page.
This implements that definition (not checked for all cases) and seems to return the correct value for the 20th percentile.
WITH RowsInSampleData AS
(
SELECT [rows]=COUNT(*)
FROM #temp_quintiles
),
Percentile AS
(
SELECT p=0.2
),
SampleData AS
(
SELECT incomes
,rn=ROW_NUMBER() OVER (ORDER BY incomes)
FROM #temp_quintiles a
),
CalculateRank AS
(
SELECT incomes, rn, [rows], [rank]=p*([rows]-1)+1, p
FROM SampleData
CROSS JOIN RowsInSampleData b
CROSS JOIN Percentile c
)
SELECT *, PERCENTILE=incomes+[rank]%1*(next_income-incomes)
FROM CalculateRank a
OUTER APPLY
(
SELECT TOP 1 incomes
FROM SampleData c
WHERE c.rn > a.rn
ORDER BY c.rn
) b (next_income)
WHERE rn=FLOOR([rank]);
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 22, 2015 at 6:43 pm
What? No thank you?
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 24, 2015 at 4:14 am
Hey Thanks a ton, i was testing the logic and its working fine with my data set. Thanks a lot once again 🙂
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply