December 29, 2011 at 8:00 am
I tried function top 80 percent i sorted desd, but it still get some unwanted records.
6020%
5017%
5017%
4515%
3010%80%
258%
155%
103%
103%
total 295100%
i can do in Excel, i need to use all records >=80 percent and other just deleted. any good advice how to do inSQL 2008? thanks
December 29, 2011 at 8:41 am
Select top 80 percent * from <sometable>
will get you 80 percent of the total number of records. Since you have 9 records in the table you show, it returns the top 8 records in your code.
If you want to get the top 80% of a cumulative sum of a numeric field, you have to do something like this (assuming you want them ordered in descending order):
--Create data
select * into databkup.dbo.testtbl
from (
select 60 [num] union all
select 50 [num] union all
select 50 [num] union all
select 45 [num] union all
select 30 [num] union all
select 25 [num] union all
select 15 [num] union all
select 10 [num] union all
select 10 [num]
) num
--Start code
;with cumsum as
(
select num, ROW_NUMBER() over (order by num desc) id
from testtbl
)
select c1.num, SUM(c2.num) csum from cumsum c1
join cumsum c2 on c1.id >= c2.id
group by c1.id, c1.num
having SUM(c2.num) <= .8 * (select SUM(num) from testtbl)
order by c1.num desc
December 29, 2011 at 9:04 am
I want to get the top 80% of a cumulative sum of a numeric field. Thanks but i have more than 1 mln rows and i am newbees. i am looking for something more simple.
thanks
December 29, 2011 at 9:09 am
roryp 96873 (12/29/2011)
If you want to get the top 80% of a cumulative sum of a numeric field, you have to do something like this (assuming you want them ordered in descending order):
--Create data
select * into databkup.dbo.testtbl
from (
select 60 [num] union all
select 50 [num] union all
select 50 [num] union all
select 45 [num] union all
select 30 [num] union all
select 25 [num] union all
select 15 [num] union all
select 10 [num] union all
select 10 [num]
) num
--Start code
;with cumsum as
(
select num, ROW_NUMBER() over (order by num desc) id
from testtbl
)
select c1.num, SUM(c2.num) csum from cumsum c1
join cumsum c2 on c1.id >= c2.id
group by c1.id, c1.num
having SUM(c2.num) <= .8 * (select SUM(num) from testtbl)
order by c1.num desc
This uses a triangular join which grows geometrically and will quickly grind your server to halt. For a much better approach to calculating running totals read Jeff Moden's article Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url]. Be sure to read all of the warnings and the discussion for the article as well.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 29, 2011 at 9:14 am
Thanks Drew, I will check out that article. I am pretty new to SQL Server myself and that method was the first thing that popped into my head.
I am currently running it on a million rows on my local machine and can see what you mean about bringing things to a grinding halt.
December 29, 2011 at 9:23 am
cometav2011 (12/29/2011)
I want to get the top 80% of a cumulative sum of a numeric field. Thanks but i have more than 1 mln rows and i am newbees. i am looking for something more simple.thanks
Running totals are inherently complex in T-SQL (2008). You can't expect a simple solution to an inherently complex problem. If you want a simple solution, you'll have to wait for SQL 2012, because they're introducing functions that will make this much simpler.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 29, 2011 at 10:56 am
Cometav, perhaps this will help you.
Drew and/or others, I would appriciate any feedback on this method.
After reading Jeff Moden's article that Drew suggested (which was straightforward and very helpful), I came up with this solution. The problem is that it will only work with a clustered index on the table which may not be cometav's case, I am not sure. I did run this method on a table with a million rows and it finished in about 4 seconds. The only change I had to make was switching the csum column from an int to a bigint as the numbers started to get fairly large in my example.
select num, null csum into databkup.dbo.testtbl
from (
select 60 [num] union all
select 50 [num] union all
select 25 [num] union all
select 45 [num] union all
select 30 [num] union all
select 50 [num] union all
select 15 [num] union all
select 10 [num] union all
select 10 [num]
) num
create clustered index ixc_num
on dbo.testtbl (num)
--Using the "Quirky Update" from Jeff Moden's article to populate the csum column.
declare @csum bigint
declare @counter int
set @csum = 0
set @counter = 0
update testtbl
set @csum = csum = case when @counter = 1 then num
else @csum + num end,
@counter = @counter + 1
from testtbl
option (maxdop 1)
go
--Selecting only the rows with a csum under 80% of the total when in descending order.
select num
from testtbl
where ((select SUM(num) from testtbl)-csum)/convert(float, (select SUM(num) from testtbl)) <= .8
order by num desc
Again, any feedback on this would be greatly appriciated. As I said before I am still pretty new to SQL Server and I can manage to get most things done in there, but my efficiency at a lot of those things could use plenty of work.
December 29, 2011 at 1:40 pm
Here is my solution...*edit had to remove numbers CTE :-P*
;WITH NUMBERS(value)
AS(SELECT 60
UNION ALL
SELECT 50
UNION ALL
SELECT 50
UNION ALL
SELECT 45
UNION ALL
SELECT 30
UNION ALL
SELECT 25
UNION ALL
SELECT 15
UNION ALL
SELECT 10
UNION ALL
SELECT 10
)
, RANKED
AS (SELECT rc = ROW_NUMBER() OVER (ORDER BY VALUE desc),
value,total = cast(value as float)/cast(x.total as float),
maxper = CAST(x.total * .8 as float)
FROM NUMBERS
CROSS APPLY (SELECT total = SUM(value) FROM NUMBERS) x )
, maxrecords
AS (SELECT TOP 1 record,min,max, perc = SUM(r2.total)
FROM (SELECT record = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
min = R1.rc,
max = R2.rc
FROM RANKED R1 CROSS JOIN RANKED R2
WHERE R1.rc < R2.rc AND R1.rc = 1
) NEWRANK CROSS JOIN RANKED r1 CROSS JOIN RANKED r2 --== All the magic is done here
WHERE min = r1.rc and max >= r2.rc
GROUP BY record,min,max
HAVING SUM(r2.total) <= .80 --This of course can be changed
ORDER BY record DESC
)
SELECT value,total FROM RANKED WHERE rc BETWEEN 1 AND (SELECT MAX FROM maxrecords)
December 29, 2011 at 2:02 pm
thanks all
December 30, 2011 at 3:14 am
A SQL Server 2012 solution, for comparison's sake:
DECLARE @data TABLE
(
item INTEGER NOT NULL
)
;
INSERT @data
(item)
VALUES
(60),
(50),
(50),
(45),
(30),
(25),
(15),
(10),
(10)
;
SELECT
item,
pct_sum
FROM
(
SELECT
d.item,
pct_sum =
SUM(d.item) OVER (ORDER BY d.item DESC) * 100 /
SUM(d.item) OVER ()
FROM @data AS d
) AS ps
WHERE
ps.pct_sum <= 80
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
December 30, 2011 at 3:39 am
roryp 96873 (12/29/2011)
After reading Jeff Moden's article that Drew suggested (which was straightforward and very helpful), I came up with this solution. The problem is that it will only work with a clustered index on the table which may not be cometav's case, I am not sure. I did run this method on a table with a million rows and it finished in about 4 seconds. The only change I had to make was switching the csum column from an int to a bigint as the numbers started to get fairly large in my example.
When I try this code on SQL Server 2008 R2 or SQL Server 2012, I get this error:
[font="Courier New"]Msg 425, Level 16, State 1, Line 7
Data type bigint of receiving variable is not equal to the data type int of column 'csum'.[/font]
Anyway, the 'quirky update' is problematic for a number of reasons. It does require, as you say, a particular clustered index on the table. Also, it persists the running total in a separate column, which again may not be suitable for the case at hand. The biggest objection to the method though is that it relies on undocumented (and therefore unsupported) behaviour. This is generally enough to dissuade people from using it in production. Alternative methods are available, as shown in the (rather long) comment thread associated with Jeff's article.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
December 30, 2011 at 7:59 am
Thanks Paul, I haven't made it through the comments in that article yet, but I will be sure to read through them. That error came from me switching the cumulative sum column from an int to bigint. I think I left it as a bigint in the code I posted, switching it back to an int should make it work.
December 30, 2011 at 8:11 am
roryp 96873 (12/30/2011)
Thanks Paul, I haven't made it through the comments in that article yet, but I will be sure to read through them. That error came from me switching the cumulative sum column from an int to bigint. I think I left it as a bigint in the code I posted, switching it back to an int should make it work.
Yep, no worries.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply