top 80 percent from sum

Question

top 80 percent from sum

cometav2011

SSC Veteran

Points: 285
More actions
December 29, 2011 at 8:00 am

#254118

I tried function top 80 percent i sorted desd, but it still get some unwanted records.
6020%
5017%
5017%
4515%
3010%80%
258%
155%
103%
103%
total 295100%
i can do in Excel, i need to use all records >=80 percent and other just deleted. any good advice how to do inSQL 2008? thanks

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

roryp 96873 SSCertifiable Points: 6649 More actions · Answer 1

Select top 80 percent * from <sometable>

will get you 80 percent of the total number of records. Since you have 9 records in the table you show, it returns the top 8 records in your code.

If you want to get the top 80% of a cumulative sum of a numeric field, you have to do something like this (assuming you want them ordered in descending order):

--Create data

select * into databkup.dbo.testtbl

from (

select 60 [num] union all

select 50 [num] union all

select 45 [num] union all

select 30 [num] union all

select 25 [num] union all

select 15 [num] union all

select 10 [num] union all

select 10 [num]

) num

--Start code

;with cumsum as

(

select num, ROW_NUMBER() over (order by num desc) id

from testtbl

)

select c1.num, SUM(c2.num) csum from cumsum c1

join cumsum c2 on c1.id >= c2.id

group by c1.id, c1.num

having SUM(c2.num) <= .8 * (select SUM(num) from testtbl)

order by c1.num desc

cometav2011 SSC Veteran Points: 285 More actions · Answer 2

I want to get the top 80% of a cumulative sum of a numeric field. Thanks but i have more than 1 mln rows and i am newbees. i am looking for something more simple.

thanks

drew.allen SSC Guru Points: 77014 More actions · Answer 3

roryp 96873 (12/29/2011)
If you want to get the top 80% of a cumulative sum of a numeric field, you have to do something like this (assuming you want them ordered in descending order):
--Create data
select * into databkup.dbo.testtbl
from (
select 60 [num] union all
select 50 [num] union all
select 50 [num] union all
select 45 [num] union all
select 30 [num] union all
select 25 [num] union all
select 15 [num] union all
select 10 [num] union all
select 10 [num]
) num
--Start code
;with cumsum as
(
select num, ROW_NUMBER() over (order by num desc) id
from testtbl
)
select c1.num, SUM(c2.num) csum from cumsum c1
join cumsum c2 on c1.id >= c2.id
group by c1.id, c1.num
having SUM(c2.num) <= .8 * (select SUM(num) from testtbl)
order by c1.num desc

This uses a triangular join which grows geometrically and will quickly grind your server to halt. For a much better approach to calculating running totals read Jeff Moden's article Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url]. Be sure to read all of the warnings and the discussion for the article as well.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

roryp 96873 SSCertifiable Points: 6649 More actions · Answer 4

Thanks Drew, I will check out that article. I am pretty new to SQL Server myself and that method was the first thing that popped into my head.

I am currently running it on a million rows on my local machine and can see what you mean about bringing things to a grinding halt.

drew.allen SSC Guru Points: 77014 More actions · Answer 5

cometav2011 (12/29/2011)
I want to get the top 80% of a cumulative sum of a numeric field. Thanks but i have more than 1 mln rows and i am newbees. i am looking for something more simple.
thanks

Running totals are inherently complex in T-SQL (2008). You can't expect a simple solution to an inherently complex problem. If you want a simple solution, you'll have to wait for SQL 2012, because they're introducing functions that will make this much simpler.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

roryp 96873 SSCertifiable Points: 6649 More actions · Answer 6

Cometav, perhaps this will help you.

Drew and/or others, I would appriciate any feedback on this method.

After reading Jeff Moden's article that Drew suggested (which was straightforward and very helpful), I came up with this solution. The problem is that it will only work with a clustered index on the table which may not be cometav's case, I am not sure. I did run this method on a table with a million rows and it finished in about 4 seconds. The only change I had to make was switching the csum column from an int to a bigint as the numbers started to get fairly large in my example.

select num, null csum into databkup.dbo.testtbl

from (

select 60 [num] union all

select 50 [num] union all

select 25 [num] union all

select 45 [num] union all

select 30 [num] union all

select 50 [num] union all

select 15 [num] union all

select 10 [num] union all

select 10 [num]

) num

create clustered index ixc_num

on dbo.testtbl (num)

--Using the "Quirky Update" from Jeff Moden's article to populate the csum column.

declare @csum bigint

declare @counter int

set @csum = 0

set @counter = 0

update testtbl

set @csum = csum = case when @counter = 1 then num

else @csum + num end,

@counter = @counter + 1

from testtbl

option (maxdop 1)

go

--Selecting only the rows with a csum under 80% of the total when in descending order.

select num

from testtbl

where ((select SUM(num) from testtbl)-csum)/convert(float, (select SUM(num) from testtbl)) <= .8

order by num desc

Again, any feedback on this would be greatly appriciated. As I said before I am still pretty new to SQL Server and I can manage to get most things done in there, but my efficiency at a lot of those things could use plenty of work.

SQL Padawan Right there with Babe Points: 792 More actions · Answer 7

Here is my solution...*edit had to remove numbers CTE :-P*

;WITH NUMBERS(value)

AS(SELECT 60

UNION ALL

SELECT 50

UNION ALL

SELECT 50

UNION ALL

SELECT 45

UNION ALL

SELECT 30

UNION ALL

SELECT 25

UNION ALL

SELECT 15

UNION ALL

SELECT 10

UNION ALL

SELECT 10

)

, RANKED

AS (SELECT rc = ROW_NUMBER() OVER (ORDER BY VALUE desc),

value,total = cast(value as float)/cast(x.total as float),

maxper = CAST(x.total * .8 as float)

FROM NUMBERS

CROSS APPLY (SELECT total = SUM(value) FROM NUMBERS) x )

, maxrecords

AS (SELECT TOP 1 record,min,max, perc = SUM(r2.total)

FROM (SELECT record = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

min = R1.rc,

max = R2.rc

FROM RANKED R1 CROSS JOIN RANKED R2

WHERE R1.rc < R2.rc AND R1.rc = 1

) NEWRANK CROSS JOIN RANKED r1 CROSS JOIN RANKED r2 --== All the magic is done here

WHERE min = r1.rc and max >= r2.rc

GROUP BY record,min,max

HAVING SUM(r2.total) <= .80 --This of course can be changed

ORDER BY record DESC

)

SELECT value,total FROM RANKED WHERE rc BETWEEN 1 AND (SELECT MAX FROM maxrecords)

cometav2011 SSC Veteran Points: 285 More actions · Answer 8

cometav2011

SSC Veteran

Points: 285

December 29, 2011 at 2:02 pm

#1426708

thanks all

Paul White SSC Guru Points: 150468 More actions · Answer 9

A SQL Server 2012 solution, for comparison's sake:

DECLARE @Data TABLE

(

item INTEGER NOT NULL

)

;

INSERT @Data

(item)

VALUES

(60),

(50),

(45),

(30),

(25),

(15),

(10),

(10)

;

SELECT

item,

pct_sum

FROM

(

SELECT

d.item,

pct_sum =

SUM(d.item) OVER (ORDER BY d.item DESC) * 100 /

SUM(d.item) OVER ()

FROM @Data AS d

) AS ps

WHERE

ps.pct_sum <= 80

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 10

roryp 96873 (12/29/2011)
After reading Jeff Moden's article that Drew suggested (which was straightforward and very helpful), I came up with this solution. The problem is that it will only work with a clustered index on the table which may not be cometav's case, I am not sure. I did run this method on a table with a million rows and it finished in about 4 seconds. The only change I had to make was switching the csum column from an int to a bigint as the numbers started to get fairly large in my example.

When I try this code on SQL Server 2008 R2 or SQL Server 2012, I get this error:

[font="Courier New"]Msg 425, Level 16, State 1, Line 7

Data type bigint of receiving variable is not equal to the data type int of column 'csum'.[/font]

Anyway, the 'quirky update' is problematic for a number of reasons. It does require, as you say, a particular clustered index on the table. Also, it persists the running total in a separate column, which again may not be suitable for the case at hand. The biggest objection to the method though is that it relies on undocumented (and therefore unsupported) behaviour. This is generally enough to dissuade people from using it in production. Alternative methods are available, as shown in the (rather long) comment thread associated with Jeff's article.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

roryp 96873 SSCertifiable Points: 6649 More actions · Answer 11

Thanks Paul, I haven't made it through the comments in that article yet, but I will be sure to read through them. That error came from me switching the cumulative sum column from an int to bigint. I think I left it as a bigint in the code I posted, switching it back to an int should make it work.

Paul White SSC Guru Points: 150468 More actions · Answer 12

roryp 96873 (12/30/2011)
Thanks Paul, I haven't made it through the comments in that article yet, but I will be sure to read through them. That error came from me switching the cumulative sum column from an int to bigint. I think I left it as a bigint in the code I posted, switching it back to an int should make it work.

Yep, no worries.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi