calculate median value

  • hi,

    i wanted to calculate median value.

    have a table fact_abc which contains cust_key, fscore

    have a dimension dim_cust, cust_key and indicator

    i wrote a query like follows

    select cust_key,min(fscore),max(fscore),avg(fscore) from fact_abc a, dim_cust b where a.cust_key=b.cust_key and b.indicator=1

    group by cust_key

    the above query is woring fine.

    i want to calculate median(fscore) along with the above selected fields.

    please let me know how can i achieve it as we dont have median function in mssql. one more thing is my fact contains 15 million and dim contains 500000 records

  • Hi

    Maybe my English is too bad, but could you explain me the difference between "median" and "average"?

    Edited: Or I just don't get it... 😉

    Greets

    Flo

  • WITH CTE

    AS

    (

    SELECT a.cust_key,

    a.fscore,

    ROW_NUMBER() OVER(PARTITION BY a.cust_key ORDER BY a.fscore ASC) AS rn,

    COUNT(*) OVER(PARTITION BY a.cust_key) AS cn

    FROM fact_abc a

    INNER JOIN dim_cust b ON a.cust_key=b.cust_key AND b.indicator=1

    )

    SELECT cust_key,

    MIN(fscore) AS [Minimum],

    MAX(fscore) AS [Maximum],

    AVG(fscore) AS [Average],

    AVG(CASE WHEN rn BETWEEN cn-rn AND cn-rn+2 THEN fscore END) AS Median

    FROM CTE

    GROUP BY cust_key

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • I would make a couple of minor changes to Mark's query:

    1) Use COUNT(a.fscore) rather than COUNT(*) when calculating cn in order to avoid counting rows with NULL fscore values.

    2) Use following expression to calculate the median:

    AVG(CASE WHEN 2 * rn - cn BETWEEN 0 AND 2 THEN fscore END)

    When the number of non-NULL fscore values is odd, this expression will return the value of the single central value.

    When the number of non-NULL fscore values is even, this expression will return the average (i.e. mean) of the 2 central values.

    WITH CTE

    AS

    (

    SELECT a.cust_key,

    a.fscore,

    ROW_NUMBER() OVER (PARTITION BY a.cust_key ORDER BY a.fscore ASC) AS rn,

    COUNT(a.fscore) OVER(PARTITION BY a.cust_key) AS cn

    FROM fact_abc a

    INNER JOIN dim_cust b ON (a.cust_key = b.cust_key)

    WHERE (b.indicator = 1)

    )

    SELECT cust_key,

    MIN(fscore) AS [Minimum],

    MAX(fscore) AS [Maximum],

    AVG(fscore) AS [Average],

    AVG(CASE WHEN 2 * rn - cn BETWEEN 0 AND 2 THEN fscore END) AS Median

    FROM CTE

    GROUP BY cust_key

  • andrewd.smith (4/9/2009)


    I would make a couple of minor changes to Mark's query:

    1) Use COUNT(a.fscore) rather than COUNT(*) when calculating cn in order to avoid counting rows with NULL fscore values.

    2) Use following expression to calculate the median:

    AVG(CASE WHEN 2 * rn - cn BETWEEN 0 AND 2 THEN fscore END)

    When the number of non-NULL fscore values is odd, this expression will return the value of the single central value.

    When the number of non-NULL fscore values is even, this expression will return the average (i.e. mean) of the 2 central values.

    WITH CTE

    AS

    (

    SELECT a.cust_key,

    a.fscore,

    ROW_NUMBER() OVER (PARTITION BY a.cust_key ORDER BY a.fscore ASC) AS rn,

    COUNT(a.fscore) OVER(PARTITION BY a.cust_key) AS cn

    FROM fact_abc a

    INNER JOIN dim_cust b ON (a.cust_key = b.cust_key)

    WHERE (b.indicator = 1)

    )

    SELECT cust_key,

    MIN(fscore) AS [Minimum],

    MAX(fscore) AS [Maximum],

    AVG(fscore) AS [Average],

    AVG(CASE WHEN 2 * rn - cn BETWEEN 0 AND 2 THEN fscore END) AS Median

    FROM CTE

    GROUP BY cust_key

    I believe this is the "financial median", to get the "statistical median" do this

    AVG(CASE WHEN 2 * rn - cn BETWEEN 0 AND 2 THEN fscore END) AS FinancialMedian,

    MAX(CASE WHEN 2 * rn - cn BETWEEN 0 AND 2 THEN fscore END) AS StatisticalMedian

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Flo: An average is reached by summing the values and dividing by the number of values. A median is the value at the middle of an ordered set of values. A clumsy illustration follows:

    declare @temp table (value numeric (5,2) not null) ;

    --

    with Tally (N) AS

    (SELECT TOP 5 ROW_NUMBER() over (order by sc1.id)

    FROM Master.dbo.SysColumns sc1 )

    insert into @temp

    select N from tally ;

    --

    insert into @temp

    select top 2 value from @temp ;

    --

    select * from @temp order by value ;

    --

    with cteSummary as

    (select sum(value) as totalValue,count(value) as rowsCount,avg(value) as avgValue from @temp)

    ,cteTemp as

    (select value, ROW_NUMBER() over (order by value) as rowID from @temp)

    select *

    ,(ceiling(rowsCount/2.0)) as medianPosition

    ,(select value from cteTemp where rowID = ceiling(rowsCount/2.0)) as medianValue

    from cteSummary

    Obviously it's easy when there are an odd number of values. There are different policies which can be applied when there are an even number of values.

    http://en.wikipedia.org/wiki/Median

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Hi Bob!

    Thanks a lot for the English education! 🙂

    Greets

    Flo

  • Always happy to help, Flo. 🙂

    By the way, Mark is right. NULLS shouldn't count. My code above would give false results if null values were allowed into @temp.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply