Eliminate top and bottom percent of records

Question

Post reply

Eliminate top and bottom percent of records

ehlinger

Hall of Fame

Points: 3585
More actions
June 14, 2007 at 10:07 pm

#117437

I have a temp table that is being populated and below is a representation of the data. Basically, for each mile_code and rep combination there will be ‘X’ number of records. (In the example below there are 30 records.) What I need to do is eliminate the top and bottom 10% (This could be any percentage and I want to be able to change that easily if I have to.) of DeltaT for each mile_code and rep combination and then average the remaining values in DeltaT for each mile_code and rep combination.

I’ve tried a bunch of different approaches and none give me what I’m looking for. Any ideas or solutions?

mile_code                 DeltaT                    rep                            close_date
Approach                 1                            Coleman                    6/5/2007 16:52
Approach                 1                            COLEMAN                 6/5/2007 14:32
Approach                 1                            COLEMAN                 6/5/2007 9:26
Approach                 21                          COLEMAN                 5/31/2007 14:07
Approach                 14                          COLEMAN                 5/31/2007 9:53
Approach                 7                            Coleman                    5/30/2007 16:12
Approach                 36                          COLEMAN                 5/29/2007 15:53
Approach                 8                            COLEMAN                 5/29/2007 13:54
Approach                 8                            COLEMAN                 5/29/2007 13:23
Approach                 8                            COLEMAN                 5/29/2007 9:00
… (20 more records)
Interview                  1                            Coleman                    6/5/2007 16:52
Interview                  1                            COLEMAN                 6/5/2007 14:32
Interview                  1                            COLEMAN                 6/5/2007 9:26
Interview                  8                            COLEMAN                 5/29/2007 13:54
Interview                  8                            COLEMAN                 5/29/2007 9:00
Interview                  14                          COLEMAN                 5/25/2007 8:37
Interview                  61                          COLEMAN                 5/24/2007 10:57
Interview                  14                          COLEMAN                 5/23/2007 9:32
Interview                  26                          COLEMAN                 5/18/2007 9:39
Interview                  23                          Coleman                    5/14/2007 13:25
… (20 more records)
Followed by 30 each of the other 3 milecodes for Coleman

Approach                 0                            DOLAN                      6/6/2007 9:27
Approach                 1                            DOLAN                      6/5/2007 13:39
Approach                 1                            DOLAN                      6/5/2007 12:20
Approach                 1                            DOLAN                      6/5/2007 11:37
Approach                 92                          DOLAN                      6/5/2007 11:12
Approach                 91                          DOLAN                      6/5/2007 9:44
Approach                 2                            Dolan                         6/4/2007 11:53
Approach                 2                            DOLAN                      6/4/2007 11:11
Approach                 5                            Dolan                         6/1/2007 16:07
Approach                 177                        DOLAN                      6/1/2007 10:58
… (20 more records)
Interview                  96                          DOLAN                      6/5/2007 16:38
Interview                  167                        DOLAN                      6/5/2007 14:24
Interview                  2                            Dolan                         6/4/2007 14:54
Interview                  2                            DOLAN                      6/4/2007 11:11
Interview                  5                            Dolan                         6/1/2007 16:07
Interview                  27                          DOLAN                      6/1/2007 14:48
Interview                  323                        DOLAN                      6/1/2007 13:49
Interview                  15                          Dolan                         6/1/2007 11:48
Interview                  224                        DOLAN                      5/31/2007 16:40
Interview                  126                        Dolan                         5/30/2007 15:02
… (20 more records)
…

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110209 More actions · Answer 1

Top by what?

DeltaT?

What if 50% of lines have DeltaT = 1?

What if there are 25 lines? How many to cut off?

_____________
Code for TallyGenerator

ehlinger Hall of Fame Points: 3585 More actions · Answer 2

I thought of this on the way to work this morning and realized the way I described what I wanted was incorrect. If the data is sorted by DeltaT for each mile_code and rep combination then in this example (30 records for each group and 10% gone from top to bottom) the first 3 records and last 3 records should be removed. So it doesn't matter what the values of DeltaT are. If there were fractions caused by the number of records chosen and/or the percentage then I'd want it to just round to the nearest whole number.

noeld SSC Guru Points: 96590 More actions · Answer 3

Have a look at SELECT TOP * 10 PERCENT ... ORDER BY and SELECT * TOP 10 PERCENT .... ORDER BY DESC

* Noel

mrpolecat SSCertifiable Points: 7074 More actions · Answer 4

mrpolecat

SSCertifiable

Points: 7074

June 15, 2007 at 12:46 pm

#713372

does the table have a key?

mrpolecat SSCertifiable Points: 7074 More actions · Answer 5

If it has a key (and I called mine uid2 here) this may help. Not sure how fast it would run on a lot of rows.

select * from testperc t

left join

(select uid2 from (select top 15 percent uid2,deltat from testperc order by deltat desc) a

union

select uid2 from (select top 15 percent uid2,deltat from testperc order by deltat asc) b

) c

on t.uid2=c.uid2

where c.uid2 is null

ehlinger Hall of Fame Points: 3585 More actions · Answer 6

It has a key and I tried this but ended up with some strange results. I should have gotten back 120 records per rep (24 records each for 5 mile_codes) and for some reps I'd get way over that and others way under.

At any rate, I got some help from some other resources and this is what accomplished it in case anyone else has these issues. Requires SQL 2005 and I realize I probably should have posted this on the other forum.

Still have a little more work to do to get the final result before putting it into producton (Have to add some other calculations but will do that by joining in another temp table or something.) Thanks to the help of others this is what I ended up with:

DECLARE @percent int,

@recentindex int

SET @percent = 10

SET @recentindex = 45

Then crate temp table #miledata to get the last 'x' number of records for each rep in each mile_code that they have completed. Do this based off of the @recentindex variable.

Then use either of the following to get results. Haven't decided which I'll put into production yet.

OPTION A

/*This creates a common table expression to run our aggregate query against.*/

;WITH cte_AveRecords AS

(

SELECT row_number() OVER(partition BY mile_code, rep ORDER BY deltaT) AS rowNum, --gives a row number per group, ordered by deltaT

COUNT(*) OVER (partition BY mile_code, rep) AS totalRows, --gets total rows by mile_code and rep.

mile_code,

rep,

deltaT,

recentindex

FROM #miledata

)

/*This deterimines the average per mile_code and rep by selecting the rows that fall above the low percent and less than or equal to the upper percent.*/

SELECT c.mile_code,

c.rep,

CAST(AVG(CAST(a.deltaT AS Decimal(5,1))) AS Decimal(5,1)) AS AvgTime

FROM

(

SELECT mile_code, rep, deltaT, recentindex, rowNum, totalRows

FROM cte_AveRecords

WHERE rowNum > totalRows * (@percent / 100.0) --removes the bottom 'x' percent

AND rowNum < totalRows - (totalRows * (@percent / 100.0)) -- removes the top

)a

JOIN cte_AveRecords c ON a.rep = c.rep AND a.mile_code = c.mile_code

GROUP BY c.mile_code,

c.rep

ORDER BY c.rep,

c.mile_code

OPTION B

/*This creates a common table expression to run our aggregate query against.*/

;WITH cte_AveRecords AS

(

SELECT mile_code,

rep,

deltat,

/*This assigns a percentage to each row number in the mile_code and rep group. See explaination A below*/

((row_number() over(partition by mile_code, rep order by deltat)) * 100.00) / (count(*) over(partition by mile_code, rep)) as pct

FROM #miledata

)

/*This deterimines the average per mile_code and rep based on what we populated the CTE with..*/

SELECT mile_code,

rep,

CAST(AVG(CAST(deltat AS Decimal(5,1))) AS Decimal(5,1)) AS avg_deltat

FROM cte_AveRecords

WHERE pct > @percent and pct < (100.00 - @percent)

GROUP BY mile_code,

rep

ORDER BY rep,

mile_code

GO

Sergiy SSC Guru Points: 110209 More actions · Answer 7

DELETE T

FROM Table T

INNER JOIN (SELECT TOP 10 PERCENT KeyCol

FROM Table ORDER BY DelteT ASC) T1 ON T.KeyCol = T1.KeyCol

INNER JOIN (SELECT TOP 10 PERCENT KeyCol

FROM Table ORDER BY DelteT DESC) T2 ON T.KeyCol = T2.KeyCol

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 8

Other queries did top/bottom 10% of the whole table which is why you got such a disparity in rows... Serqiy's does it correctly by the key column...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Peter E. Kierstead SSCarpal Tunnel Points: 4114 More actions · Answer 9

*If* you could add an Identity column to your data when you import it (ordered properly), you could:

Declare @Lo Int,@Hi Int

Select @Lo=Ident_Current('dbo.Data')*0.10,@Hi=Ident_Current('dbo.Data')-@Lo

Select * from dbo.Data where Ident between Ident_Current('dbo.Data')*0.10 and Ident_Current('dbo.Data')

Adjust things to suit...

PeteK
I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

Peter E. Kierstead SSCarpal Tunnel Points: 4114 More actions · Answer 10

Hmmm, forgot to adjust the code for the variables...

*If* you could add an Identity column to your data when you import it (ordered properly), you could:

Declare @Lo Int,@Hi Int

Select @Lo=Ident_Current('dbo.Data')*0.10,@Hi=Ident_Current('dbo.Data')-@Lo

Select * from dbo.Data where Ident between @Lo and @hi

Adjust things to suit...

PeteK
I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.