Help identifying the highest 80 percent of a set of numbers

  • Hi,

    I have some data which includes a percentage column, where the sum of the percentages adds up to 100.

    I want to identify the records with the largest percentages, in such a way that when sorted by percentage descending, the total percent equals (or goes over) 80%.

    So I have a dataset that looks like this:

    RecIDPCIsTop80

    rec16.75

    rec24.25

    rec31.27

    rec49.31

    rec57.02

    rec64.48

    rec78.96

    rec83.25

    rec97.21

    rec107.56

    rec118.46

    rec123.95

    rec133.86

    rec146.38

    rec152.18

    rec161.22

    rec174.42

    rec189.47

    And the results I want look like this:

    RecIDPCIsTop80

    rec189.471

    rec49.311

    rec78.961

    rec118.461

    rec107.561

    rec97.211

    rec57.021

    rec16.751

    rec146.381

    rec64.481

    rec174.421

    rec24.250

    rec123.950

    rec133.860

    rec83.250

    rec152.180

    rec31.270

    rec161.220

    So when it gets to rec17, the total percentage is 80.02, at which point I want to stop flagging them.

    Any ideas how I might do this without a cursor? The real datasets have thousands of rows, and there are thousands of datasets to do this on.

    Thanks in advance do your help!

  • To get the hits, try

    select a.RecID

    from sums a, sums b

    where b.PC >=a.PC

    group by a.RecID

    having sum(b.PC) >= 80.0

  • oops, sorry . upside down

  • Great, seems to do the trick.

    Thanks for your help!

  • select a.RecID , a.PC ,

    case when 100-sum(case when b.PC >a.PC then b.PC end) <20.0 then 0 else 1 end

    from sums a, sums b

    group by a.RecID, a.PC

    rec16 1.22 0

    rec3 1.27 0

    rec15 2.18 0

    rec8 3.25 0

    rec13 3.86 0

    rec12 3.95 0

    rec2 4.25 0

    rec17 4.42 1

    rec6 4.48 1

    rec14 6.38 1

    rec1 6.75 1

    rec5 7.02 1

    rec9 7.21 1

    rec10 7.56 1

    rec11 8.46 1

    rec7 8.96 1

    rec4 9.31 1

    rec18 9.47 1

  • I'll try that one out also, the first method is very slow on my large dataset.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply