Median Calculation

  • I am working on figuring out some medain calculations in our data and was getting fairly complicated when I ran across what seems to be a simple and quick way to do it:

    CREATE TABLE #median ( Value INT, i INT identity(1,1) )

     

    USE pubs

    GO

    INSERT INTO #median (Value)

     

    SELECT price

    FROM titles

    WHERE price IS NOT NULL and type = 'trad_cook'

    ORDER BY

        price ASC

     

    DECLARE @median AS INT

    SET @median =

    (SELECT

        AVG(Value)

    FROM

        #median

    WHERE

        i = (SELECT (MAX(i) + 1) / 2 + (MAX(i) + 1) % 2 FROM #median)

        OR

        i = (SELECT (MAX(i) + 1) / 2 FROM #median)

    )

     

    SELECT @median

     

    I've tested this on a few different data sets and it appears to work correctly.  I am having trouble breaking it down though so I have a thorough understanding of how it works.  Could someone explain it further to me and specifically the portion I've placed in red?

     

    It's been a long day and quite honestly I'm having a brain fart on tyring to figure this out.  Thanks.

  •  

    Hi,

    You can try this Query : i have taken the Pubs Database Title table .

    First i create a temp table like :

    1)  SELECT *  into ##titles

         FROM #titles

         WHERE price IS NOT NULL

         ORDER BY

         price ASC

    2)  Then i used this Query ,

    Declare @STR varchar(200),@var varchar(100)

    select  @var=case when count(*)%2=0 then  count(*) / 2 else (count(*) + 1)/2 end

    from #titles where price IS NOT NULL

    set @STR='select top 1  *  from ( select  top ' + @var  + '  * from #titles order by price asc ) a order by price desc'

    exec(@str)

    Thanks,

    Amit Gupta.

     

  • First of all we must say two-three things:

    1. The 'i' is just a sequencial number from 1 to the number of rows.

    2. When we execute a query like this "select avg(10)/3 ..." we will get the integer part of the division... which is 3.

    3. The Median value according to what your query (above) defines is the number/s that divides left & right a set into the two maximum sub-sets with the same size.

    i.e.:

    Case 1:

    Suppose we have the set K = {1,2,3,4,5}, size n=5.

    the median is the 3 because divides the K into

    {1,2} and {4,5} => the maximun subsets with the same size.

    Case 2:

    Suppose we have the set K = {1,2,3,4,5,6,7,8}, size n=8.

    the median is the {4,5} because divides the K into

    {1,2,3} and {6,7,8} => the maximun subsets with the same size.

    This is exactly what your query (IN RED) implements... if u make the calcs you will see:

    For the set {1,2,3,4,5} it will be:

    select avg(value)

    from #median

    where

    i = 3 --->avg(5 + 1)/2 + ((5+1) mod 2) = 3 + 0 = 3

    or

    i = 3 --->avg(5 + 1)/2 = 3

    and.... For the set {1,2,3,4,5,6,7,8} it will be:

    select avg(value)

    from #median

    where

    i = 5 --->avg(8 + 1)/2 + ((8+1) mod 2) = 4 + 1 = 5

    or

    i = 4 --->avg(8 + 1)/2 = 3

    .... actually its a pretty nice programming trick adjusted to the... dataset logic .... waw what i said!!!!

    PS: A mod B is the residual of the division A/B.--- oooo what a grind.

    ------------
    When you 've got a hammer, everything starts to look like a nail...

  • Thanks Andreas.  Your explaination and a few hours of sleep has cleared it up.

  •   any time...

    ------------
    When you 've got a hammer, everything starts to look like a nail...

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply