April 4, 2006 at 9:57 pm
I am working on figuring out some medain calculations in our data and was getting fairly complicated when I ran across what seems to be a simple and quick way to do it:
CREATE TABLE #median ( Value INT, i INT identity(1,1) )
USE pubs
GO
INSERT INTO #median (Value)
SELECT price
FROM titles
WHERE price IS NOT NULL and type = 'trad_cook'
ORDER BY
price ASC
DECLARE @median AS INT
SET @median =
(SELECT
AVG(Value)
FROM
#median
WHERE
i = (SELECT (MAX(i) + 1) / 2 + (MAX(i) + 1) % 2 FROM #median)
OR
i = (SELECT (MAX(i) + 1) / 2 FROM #median)
)
SELECT @median
I've tested this on a few different data sets and it appears to work correctly. I am having trouble breaking it down though so I have a thorough understanding of how it works. Could someone explain it further to me and specifically the portion I've placed in red?
It's been a long day and quite honestly I'm having a brain fart on tyring to figure this out. Thanks.
April 5, 2006 at 3:46 am
Hi,
You can try this Query : i have taken the Pubs Database Title table .
First i create a temp table like :
1) SELECT * into ##titles
FROM #titles
WHERE price IS NOT NULL
ORDER BY
price ASC
2) Then i used this Query ,
Declare @STR varchar(200),@var varchar(100)
select @var=case when count(*)%2=0 then count(*) / 2 else (count(*) + 1)/2 end
from #titles where price IS NOT NULL
set @STR='select top 1 * from ( select top ' + @var + ' * from #titles order by price asc ) a order by price desc'
exec(@str)
Thanks,
Amit Gupta.
April 5, 2006 at 6:23 am
First of all we must say two-three things:
1. The 'i' is just a sequencial number from 1 to the number of rows.
2. When we execute a query like this "select avg(10)/3 ..." we will get the integer part of the division... which is 3.
3. The Median value according to what your query (above) defines is the number/s that divides left & right a set into the two maximum sub-sets with the same size.
i.e.:
Case 1:
Suppose we have the set K = {1,2,3,4,5}, size n=5.
the median is the 3 because divides the K into
{1,2} and {4,5} => the maximun subsets with the same size.
Case 2:
Suppose we have the set K = {1,2,3,4,5,6,7,8}, size n=8.
the median is the {4,5} because divides the K into
{1,2,3} and {6,7,8} => the maximun subsets with the same size.
This is exactly what your query (IN RED) implements... if u make the calcs you will see:
For the set {1,2,3,4,5} it will be:
select avg(value)
from #median
where
i = 3 --->avg(5 + 1)/2 + ((5+1) mod 2) = 3 + 0 = 3
or
i = 3 --->avg(5 + 1)/2 = 3
and.... For the set {1,2,3,4,5,6,7,8} it will be:
select avg(value)
from #median
where
i = 5 --->avg(8 + 1)/2 + ((8+1) mod 2) = 4 + 1 = 5
or
i = 4 --->avg(8 + 1)/2 = 3
.... actually its a pretty nice programming trick adjusted to the... dataset logic .... waw what i said!!!!
PS: A mod B is the residual of the division A/B.--- oooo what a grind.
------------
When you 've got a hammer, everything starts to look like a nail...
April 5, 2006 at 8:46 am
Thanks Andreas. Your explaination and a few hours of sleep has cleared it up.
April 5, 2006 at 8:48 am
any time...
------------
When you 've got a hammer, everything starts to look like a nail...
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply