November 14, 2011 at 12:49 pm
I have the following table that I need to get a median for one of the columns
mytable
State City Number
CA LA 20
CA LA 25
CA LA 15
CA SD 35
CA SD 12
CA SD 17
NY NYC 20
NY NYC 22
NY NYC 8
NY NYC 36
NY NYC 15
What I want is to create a new table that has the median for the
column Number like this:
State City Median
CA LA 20
CA SD 17
NY NYC 20
Any help would greatly be appreciated.
November 14, 2011 at 2:16 pm
This will get you there:
SELECT [State],
City,
AVG(NUMBER) AS Median
FROM (SELECT [State],
City,
NUMBER,
ROW_NUMBER() OVER (PARTITION BY [State], City ORDER BY NUMBER)
AS
RowNum,
COUNT(*) OVER (PARTITION BY [State], City)
AS
RowCnt
FROM mytable) x
WHERE RowNum IN ( ( RowCnt + 1 ) / 2, ( RowCnt + 2 ) / 2 )
GROUP BY [State],
City
ORDER BY [State],
City
Mike Scalise, PMP
https://www.michaelscalise.com
November 14, 2011 at 4:21 pm
That worked great! Thank you.
Maybe you could help me with this a bit more. What if I had two
columns that I needed to get the median in the same table
instead of column Number, what if I had Number1 and Number2.
How would I group by State and City and also get the median for those two columns?
November 14, 2011 at 7:33 pm
Can you provide an example?
Mike Scalise, PMP
https://www.michaelscalise.com
November 14, 2011 at 8:58 pm
An example would be:
mytable
State City Sales Qty
CA LA 1000 5
CA LA 2000 10
CA LA 2200 12
CA SD 3500 23
CA SD 2300 18
CA SD 4000 38
NY NYC 10000 65
NY NYC 12000 74
NY NYC 15000 88
NY NYC 16000 90
NY NYC 14500 80
newtable
state city median sales median qty
CA LA 2000 10
CA SD 3500 23
NY NYC 14500 80
Thanks for the help!
November 14, 2011 at 11:15 pm
FYI: Mean, median, and mode are three kinds of "averages".
Find the mean, median, mode, and range for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13
Mean = 15
Median = 14
Mode = 13
Mean, Median, Mode, and Range
November 15, 2011 at 8:58 am
wizard_froto (11/14/2011)
An example would be:mytable
State City Sales Qty
CA LA 1000 5
CA LA 2000 10
CA LA 2200 12
CA SD 3500 23
CA SD 2300 18
CA SD 4000 38
NY NYC 10000 65
NY NYC 12000 74
NY NYC 15000 88
NY NYC 16000 90
NY NYC 14500 80
newtable
state city median sales median qty
CA LA 2000 10
CA SD 3500 23
NY NYC 14500 80
Thanks for the help!
This isn't as efficient as the first one, but it should get you what you want.
SELECT [State],
City,
MAX(CASE WHEN RowNumSales IN ( ( RowCnt + 1 ) / 2, ( RowCnt + 2 ) / 2 ) THEN Sales ELSE NULL END) AS [Median Sales],
MAX(CASE WHEN RowNumQty IN ( ( RowCnt + 1 ) / 2, ( RowCnt + 2 ) / 2 ) THEN Qty ELSE NULL END) AS [Median Qty]
FROM (SELECT [State],
City,
Sales,
Qty,
ROW_NUMBER() OVER (PARTITION BY [State], City ORDER BY Sales) AS RowNumSales,
ROW_NUMBER() OVER (PARTITION BY [State], City ORDER BY Qty) AS RowNumQty,
COUNT(*) OVER (PARTITION BY [State], City) AS RowCnt
FROM mytable) x
GROUP BY [State],
City
ORDER BY [State],
City
Mike Scalise, PMP
https://www.michaelscalise.com
November 15, 2011 at 11:56 am
Works perfectly. Thanks Mikes84.
November 15, 2011 at 3:14 pm
A good rule of thumb to use is that you should not use code that you find on the Internet unless you understand what it is doing. If a problem arises, you'll need to be able to troubleshoot it yourself, which you won't be able to do if you don't understand the code. It's obvious that the "author" of this code didn't follow this rule.
"Author" removed
SELECT [State],
City,
MAX(CASE WHEN RowNumSales IN ( ( RowCnt + 1 ) / 2, ( RowCnt + 2 ) / 2 ) THEN Sales ELSE NULL END) AS [Median Sales],
MAX(CASE WHEN RowNumQty IN ( ( RowCnt + 1 ) / 2, ( RowCnt + 2 ) / 2 ) THEN Qty ELSE NULL END) AS [Median Qty]
FROM (SELECT [State],
City,
Sales,
Qty,
ROW_NUMBER() OVER (PARTITION BY [State], City ORDER BY Sales) AS RowNumSales,
ROW_NUMBER() OVER (PARTITION BY [State], City ORDER BY Qty) AS RowNumQty,
COUNT(*) OVER (PARTITION BY [State], City) AS RowCnt
FROM mytable) x
GROUP BY [State],
City
ORDER BY [State],
City
In order to calculate the fiscal median, you need to retrieve both the left statistical median and the right statistical median. By changing the aggregate from AVG() to MAX() this code changed from calculating the fiscal median to calculating the right statistical median, so there is no longer a reason to retrieve the left statistical median, but this code is doing so anyhow.
Due to the way that RowNumSales and RowNumQty are defined, we know the following:
* The value of the Sales(or Qty) field for RowNumSales(or RowNumQty) for the value (RowCnt + 1)/2 is always going to be the left statistical median.
* The value of the Sales(or Qty) field for RowNumSales(or RowNumQty) for the value (RowCnt + 2)/2 is always going to be the right statistical median.
* The Max() of the left and right statistical medians is always going to be equal to the right statistical median.
Furthermore, there is no reason to retrieve rows that are not the right statistical median for one of the two values, so we can filter out rows that don't meet one of those criteria to produce the following more efficient code.
SELECT [State],
City,
MAX(CASE WHEN RowNumSales = ( RowCnt + 2 ) / 2 THEN Sales ELSE NULL END) AS [Median Sales],
MAX(CASE WHEN RowNumQty = ( RowCnt + 2 ) / 2 THEN Qty ELSE NULL END) AS [Median Qty]
FROM (SELECT [State],
City,
Sales,
Qty,
ROW_NUMBER() OVER (PARTITION BY [State], City ORDER BY Sales) AS RowNumSales,
ROW_NUMBER() OVER (PARTITION BY [State], City ORDER BY Qty) AS RowNumQty,
COUNT(*) OVER (PARTITION BY [State], City) AS RowCnt
FROM mytable) x
WHERE RowNumSales = ( RowCnt + 2 ) / 2
OR RowNumQty = ( RowCnt + 2 ) / 2
GROUP BY [State],
City
ORDER BY [State],
City
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
September 19, 2013 at 11:12 am
Hi Mike,
I wish to thank you for your solution for calculating the median for multiple groupings. I have been trying to do this myself, spending a few hours. Your solution is elegantly succinct, accurate and a tool I can use repeatedly.
Thanks again.
Sincerely,
Lester Chin
February 27, 2014 at 1:58 pm
We began with a succession of efforts to apply Excel macro code entered as a function to obtain medians in a filtered worksheet list of scores of thousands of records. The code is widely-cited on the Internet but did not scale well when the worksheet contained more than a few hundred records. In the end it seemed to make more sense to preprocess in SQL before delivering the data to the end-users. The medians are an essential part of medical screening algorithms in use at our company. Thanks to all for this elegant code.
March 2, 2014 at 6:26 pm
Roy.G.Biv (2/27/2014)
We began with a succession of efforts to apply Excel macro code entered as a function to obtain medians in a filtered worksheet list of scores of thousands of records. The code is widely-cited on the Internet but did not scale well when the worksheet contained more than a few hundred records. In the end it seemed to make more sense to preprocess in SQL before delivering the data to the end-users. The medians are an essential part of medical screening algorithms in use at our company. Thanks to all for this elegant code.
You might be interested to know that there have been some advances in calculating median with higher performance:
Calculating the Median Value within a Partitioned Set Using T-SQL [/url]
Best approaches for grouped median
Note that the solution listed as 2005+2 in the second article is actually a solution that only works in SQL 2008. Although the author may at some point get this corrected.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply