March 9, 2011 at 10:56 am
Guys I need to break a data-set into around 900 subsets and then calculate mean, median, mode for each set... I also need to give a frequency distribution for each subsets... Is there a way to do through t-sql?
Here is example of my data:
ProductCode RefCode Amount
---------------------------
abc 143 59000.89
aaa 277 7000
abc 143 50000.23
abc 333 25000
So in my data if I group by ProductCode and Refcode, I have around 900 subsets... Now for each of those subsets I have calculate mean, median, mode and create a frequency distribution for each subsets separately.. I am will using Amount for the calculations...
In the example above abc 143 is one subset, aaa 277 is another and abc 333 is third subset... But data will have several records on each subsets... Please let me know if I do not make much sense here... Thanks in advance guys for the inputs.
March 9, 2011 at 12:44 pm
Sounds like homework, but it was interesting enough to get my attention anyway.
Maybe something like this:
-- Test Data
CREATE TABLE #T
(Prod CHAR(3),
Ref INT,
Amount FLOAT) ;
INSERT INTO #T
(Prod, Ref, Amount)
VALUES ('abc', 143, 59000.89),
('aaa', 277, 7000),
('abc', 143, 50000.23),
('abc', 333, 25000) ;
-- Mean
SELECT Prod,
Ref,
AVG(Amount) AS AvgAmount
FROM #T
GROUP BY Prod,
Ref ;
-- Median
SELECT DISTINCT
Prod,
Ref,
(TopHalfAmount + BottomHalfAmount) / 2
FROM #T AS T1
CROSS APPLY (SELECT TOP 1
Amount AS TopHalfAmount
FROM (SELECT TOP 50 PERCENT
Amount
FROM #T AS T2
WHERE T2.Prod = T1.Prod
AND T2.Ref = T1.Ref
ORDER BY Amount) AS TopHalfSub
ORDER BY Amount DESC) TopHalf
CROSS APPLY (SELECT TOP 1
Amount AS BottomHalfAmount
FROM (SELECT TOP 50 PERCENT
Amount
FROM #T AS T3
WHERE T3.Prod = T1.Prod
AND T3.Ref = T1.Ref
ORDER BY Amount DESC) AS BottomHalfSub
ORDER BY Amount) BottomHalf
-- Mode
SELECT DISTINCT
Prod,
Ref,
ModeAmount
FROM #T AS T1
CROSS APPLY (SELECT TOP 1
Amount AS ModeAmount
FROM #T AS T2
WHERE T2.Prod = T1.Prod
AND T2.Ref = T1.Ref
GROUP BY Amount
ORDER BY COUNT(*) DESC) AS ModeSub ;
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 9, 2011 at 1:20 pm
Thanks GSquared ... I had used http://www.sqlservercentral.com/scripts/Miscellaneous/31775/ but yours is even better as it takes care of median for even sets correctly. Thanks!
they want mode as well as subsets of data with frequency on amount as well. I guess using loop to create export and using the query used for mode should do it.
March 9, 2011 at 1:31 pm
Nice GSquared!
I just wondered as middle of implementing this When I just came and found some one already taken care.
March 9, 2011 at 1:46 pm
Ghanta (3/9/2011)
Thanks GSquared ... I had used http://www.sqlservercentral.com/scripts/Miscellaneous/31775/ but yours is even better as it takes care of median for even sets correctly. Thanks!they want mode as well as subsets of data with frequency on amount as well. I guess using loop to create export and using the query used for mode should do it.
Why a loop?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply