CUME_DIST() calculates relative position of a value relative to a group of values. The value returned by CUME_DIST() is > 0 and <= 1, which represents percentage of number of rows with value less than (for ascending order) or equal to current row.
For example, consider below data:
Year Month Amount
2010 1 5000.00
2010 2 6000.00
2010 3 7000.00
2010 4 2000.00
2011 1 1000.00
2011 2 2000.00
2011 3 3000.00
2011 4 4000.00
Cumulative distribution of row 4 (year = 2010, month = 4) will be 37.50%, as number of rows which has value <= row 4 are 3 rows, and the value returned by CUME_DIST() will be 0.375.
Using CUME_DIST():
CUME_DIST() takes an ORDER BY clause as argument, ORDER BY clause determines the order in which the operation is performed:
SELECT [Year], [Month], [Amount],
CUME_DIST() OVER (ORDER BY [Amount]) CUME_DIST
–CAST(CUME_DIST() OVER (ORDER BY [Amount]) * 100 AS VARCHAR) + '%' CUME_DIST
FROM dbo.SalesData
ORDER BY [Amount]
Result Set:
Year Month Amount CUME_DIST
2011 1 1000.00 0.125
2011 2 2000.00 0.375
2010 4 2000.00 0.375
2011 3 3000.00 0.5
2011 4 4000.00 0.625
2010 1 5000.00 0.75
2010 2 6000.00 0.875
2010 3 7000.00 1
CUME_DUST() with PARTITION BY:
PARTITION BY clause divides the result set into partition to which CUME_DIST() is applied:
SELECT [Year], [Month], [Amount],
CUME_DIST() OVER (PARTITION BY [Year] ORDER BY [Amount])
CUME_DIST
FROM dbo.SalesData
ORDER BY [Year], [Month], [Amount]
Result Set:
Year Month Amount CUME_DIST
2010 1 5000.00 0.5
2010 2 6000.00 0.75
2010 3 7000.00 1
2010 4 2000.00 0.25
2011 1 1000.00 0.25
2011 2 2000.00 0.5
2011 3 3000.00 0.75
2011 4 4000.00 1
CUME_DIST() can also be used to find "TOP" rows, check below queries which selects 50% of data and the different result sets generated by them:
;WITH tempTable
AS
(
SELECT [Year], [Month], [Amount],
CUME_DIST() OVER (ORDER BY [Amount]) CUME_DIST
FROM dbo.SalesData
)
SELECT [Year], [Month], [Amount]
FROM tempTable
WHERE CUME_DIST < 0.50
SELECT TOP 50 PERCENT [Year], [Month], [Amount]
FROM dbo.SalesData
Result Sets:
Year Month Amount
———– ———– ———————
2011 1 1000.00
2011 2 2000.00
2010 4 2000.00
(3 row(s) affected)
Year Month Amount
———– ———– ———————
2011 1 1000.00
2011 2 2000.00
2011 3 3000.00
2011 4 4000.00
(4 row(s) affected)
CUME_DIST() has been available in Oracle since 8i, and can also be used as an aggregate function in Oracle.
Hope This Helps! Cheers!
Reference : Vishal (http://SqlAndMe.com)
Filed under: SQLServer, SQLServer "Denali"