Breaking data into Subsets and Calculating Mean, Median, Mode

  • Guys I need to break a data-set into around 900 subsets and then calculate mean, median, mode for each set... I also need to give a frequency distribution for each subsets... Is there a way to do through t-sql?

    Here is example of my data:

    ProductCode RefCode Amount

    ---------------------------

    abc 143 59000.89

    aaa 277 7000

    abc 143 50000.23

    abc 333 25000

    So in my data if I group by ProductCode and Refcode, I have around 900 subsets... Now for each of those subsets I have calculate mean, median, mode and create a frequency distribution for each subsets separately.. I am will using Amount for the calculations...

    In the example above abc 143 is one subset, aaa 277 is another and abc 333 is third subset... But data will have several records on each subsets... Please let me know if I do not make much sense here... Thanks in advance guys for the inputs.

  • Sounds like homework, but it was interesting enough to get my attention anyway.

    Maybe something like this:

    -- Test Data

    CREATE TABLE #T

    (Prod CHAR(3),

    Ref INT,

    Amount FLOAT) ;

    INSERT INTO #T

    (Prod, Ref, Amount)

    VALUES ('abc', 143, 59000.89),

    ('aaa', 277, 7000),

    ('abc', 143, 50000.23),

    ('abc', 333, 25000) ;

    -- Mean

    SELECT Prod,

    Ref,

    AVG(Amount) AS AvgAmount

    FROM #T

    GROUP BY Prod,

    Ref ;

    -- Median

    SELECT DISTINCT

    Prod,

    Ref,

    (TopHalfAmount + BottomHalfAmount) / 2

    FROM #T AS T1

    CROSS APPLY (SELECT TOP 1

    Amount AS TopHalfAmount

    FROM (SELECT TOP 50 PERCENT

    Amount

    FROM #T AS T2

    WHERE T2.Prod = T1.Prod

    AND T2.Ref = T1.Ref

    ORDER BY Amount) AS TopHalfSub

    ORDER BY Amount DESC) TopHalf

    CROSS APPLY (SELECT TOP 1

    Amount AS BottomHalfAmount

    FROM (SELECT TOP 50 PERCENT

    Amount

    FROM #T AS T3

    WHERE T3.Prod = T1.Prod

    AND T3.Ref = T1.Ref

    ORDER BY Amount DESC) AS BottomHalfSub

    ORDER BY Amount) BottomHalf

    -- Mode

    SELECT DISTINCT

    Prod,

    Ref,

    ModeAmount

    FROM #T AS T1

    CROSS APPLY (SELECT TOP 1

    Amount AS ModeAmount

    FROM #T AS T2

    WHERE T2.Prod = T1.Prod

    AND T2.Ref = T1.Ref

    GROUP BY Amount

    ORDER BY COUNT(*) DESC) AS ModeSub ;

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks GSquared ... I had used http://www.sqlservercentral.com/scripts/Miscellaneous/31775/ but yours is even better as it takes care of median for even sets correctly. Thanks!

    they want mode as well as subsets of data with frequency on amount as well. I guess using loop to create export and using the query used for mode should do it.

  • Nice GSquared!

    I just wondered as middle of implementing this When I just came and found some one already taken care.

  • Ghanta (3/9/2011)


    Thanks GSquared ... I had used http://www.sqlservercentral.com/scripts/Miscellaneous/31775/ but yours is even better as it takes care of median for even sets correctly. Thanks!

    they want mode as well as subsets of data with frequency on amount as well. I guess using loop to create export and using the query used for mode should do it.

    Why a loop?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply