July 5, 2011 at 9:48 am
Hi there,
Does anyone have any useful code that I can use to show distributions of data ? Here's some test rows for 1 metric.
1
1
2
2
2
4
5
6
7
8
8
9
10
What I want to do is to break the rows into deciles (so, 10 segments) and count the number of rows in each decile. The range of the deciles needs to be calculated by finding the min and max values and then dividing the difference by 10 (almost, I think).
So, in the example, Decile 2 would equal 3, as there are 3 rows with a value of 2, and the decile range is 10 segments of 1.
I'll be doing some customer profiling work, and thought this would make it easier !
Regards, Greg
July 5, 2011 at 2:12 pm
Greg
Here is a solution based on your criteria, which is to simply use the minimum and maximum values of existing data as the endpoints of your distribution, then divide by 10. In essence, you are creating an interval width. Your situation might require this, but I think you might be well served by setting up some static frequency parameters. For example, with your data, the max(10) - min(1) / 10 = .9. For your data, it does not make sense to have the interval for the first decile to be 1 to 1.9, since there are no decimal values in your data. It might make sense to use the min and max possible values instead, then divide that value by 10. And maybe you are already doing this. Anyhow, see below.
IF OBJECT_ID('TempDB..#data','u') IS NOT NULL
DROP TABLE #data
GO
CREATE TABLE #data
(
Datum INT
)
GO
INSERT INTO #data
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 2 UNION ALL
SELECT 2 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 10
GO
DECLARE @min-2 INT, @max-2 INT, @width FLOAT
SELECT @min-2 = (SELECT MIN(datum) FROM #data)
SELECT @max-2 = (SELECT MAX(datum) FROM #data)
SELECT @width = (@max-@min)/(10*1.0)
SELECT
Datum,
Frequency = COUNT(datum),
Decile = CASE WHEN datum <= (@min + (@width * 1)) THEN '1st'
WHEN datum > (@min + (@width * 1)) AND datum <= (@min + (@width * 2)) THEN '2nd'
WHEN datum > (@min + (@width * 2)) AND datum <= (@min + (@width * 3)) THEN '3rd'
WHEN datum > (@min + (@width * 3)) AND datum <= (@min + (@width * 4)) THEN '4th'
WHEN datum > (@min + (@width * 4)) AND datum <= (@min + (@width * 5)) THEN '5th'
WHEN datum > (@min + (@width * 5)) AND datum <= (@min + (@width * 6)) THEN '6th'
WHEN datum > (@min + (@width * 6)) AND datum <= (@min + (@width * 7)) THEN '7th'
WHEN datum > (@min + (@width * 7)) AND datum <= (@min + (@width * 8)) THEN '8th'
WHEN datum > (@min + (@width * 8)) AND datum <= (@min + (@width * 9)) THEN '9th'
WHEN datum > (@min + (@width * 9)) AND datum <= @max-2 THEN '10th'
ELSE NULL
END
FROM #data
GROUP BY datum
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
July 5, 2011 at 2:19 pm
Greg
On second glance, I think I got it wrong the first time. I think this is what you want.
DECLARE @min-2 INT, @max-2 INT, @width FLOAT
SELECT @min-2 = (SELECT MIN(datum) FROM #data)
SELECT @max-2 = (SELECT MAX(datum) FROM #data)
SELECT @width = (@max-@min)/(10*1.0)
SELECT
Decile,
Frequency = COUNT(decile)
FROM
(
SELECT
Datum,
Decile = CASE WHEN datum <= (@min + (@width * 1)) THEN 1
WHEN datum > (@min + (@width * 1)) AND datum <= (@min + (@width * 2)) THEN 2
WHEN datum > (@min + (@width * 2)) AND datum <= (@min + (@width * 3)) THEN 3
WHEN datum > (@min + (@width * 3)) AND datum <= (@min + (@width * 4)) THEN 4
WHEN datum > (@min + (@width * 4)) AND datum <= (@min + (@width * 5)) THEN 5
WHEN datum > (@min + (@width * 5)) AND datum <= (@min + (@width * 6)) THEN 6
WHEN datum > (@min + (@width * 6)) AND datum <= (@min + (@width * 7)) THEN 7
WHEN datum > (@min + (@width * 7)) AND datum <= (@min + (@width * 8)) THEN 8
WHEN datum > (@min + (@width * 8)) AND datum <= (@min + (@width * 9)) THEN 9
WHEN datum > (@min + (@width * 9)) AND datum <= @max-2 THEN 10
ELSE NULL
END
FROM #data
) t1
GROUP BY decile
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
July 6, 2011 at 4:20 am
Thanks Greg,
That works really, well and I'll just add in the decile range valeus to the output now - great !
Greg
July 6, 2011 at 6:52 am
Do you care to see that decile 3 has 0 items?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
July 6, 2011 at 7:10 am
Not really, it wouldn't be a problem when plotting as a visualisation.
July 6, 2011 at 10:58 am
greg.bull (7/6/2011)
Not really, it wouldn't be a problem when plotting as a visualisation.
I only asked because if you are not interested in pulling a complete set of deciles from SQL Server then I wonder why you're doing this in T-SQL in the first place. I don't know the whole story though. If you are interested in presenting a 0 for the 3rd decile because you are simply dropping the results onto a web page in tabular format I could see using T-SQL for this, in which case you would want that 3 to appear in your results table with a 0 count. However, since you're plotting this (using some type of visualization tool?), is there not a built-in function for calculating deciles that could do this for you outside the database? Just wondering. What are you using to plot/display the results?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply