June 14, 2007 at 10:07 pm
I have a temp table that is being populated and below is a representation of the data. Basically, for each mile_code and rep combination there will be ‘X’ number of records. (In the example below there are 30 records.) What I need to do is eliminate the top and bottom 10% (This could be any percentage and I want to be able to change that easily if I have to.) of DeltaT for each mile_code and rep combination and then average the remaining values in DeltaT for each mile_code and rep combination.
I’ve tried a bunch of different approaches and none give me what I’m looking for. Any ideas or solutions?
mile_code DeltaT rep close_date
Approach 1 Coleman 6/5/2007 16:52
Approach 1 COLEMAN 6/5/2007 14:32
Approach 1 COLEMAN 6/5/2007 9:26
Approach 21 COLEMAN 5/31/2007 14:07
Approach 14 COLEMAN 5/31/2007 9:53
Approach 7 Coleman 5/30/2007 16:12
Approach 36 COLEMAN 5/29/2007 15:53
Approach 8 COLEMAN 5/29/2007 13:54
Approach 8 COLEMAN 5/29/2007 13:23
Approach 8 COLEMAN 5/29/2007 9:00
… (20 more records)
Interview 1 Coleman 6/5/2007 16:52
Interview 1 COLEMAN 6/5/2007 14:32
Interview 1 COLEMAN 6/5/2007 9:26
Interview 8 COLEMAN 5/29/2007 13:54
Interview 8 COLEMAN 5/29/2007 9:00
Interview 14 COLEMAN 5/25/2007 8:37
Interview 61 COLEMAN 5/24/2007 10:57
Interview 14 COLEMAN 5/23/2007 9:32
Interview 26 COLEMAN 5/18/2007 9:39
Interview 23 Coleman 5/14/2007 13:25
… (20 more records)
Followed by 30 each of the other 3 milecodes for Coleman
Approach 0 DOLAN 6/6/2007 9:27
Approach 1 DOLAN 6/5/2007 13:39
Approach 1 DOLAN 6/5/2007 12:20
Approach 1 DOLAN 6/5/2007 11:37
Approach 92 DOLAN 6/5/2007 11:12
Approach 91 DOLAN 6/5/2007 9:44
Approach 2 Dolan 6/4/2007 11:53
Approach 2 DOLAN 6/4/2007 11:11
Approach 5 Dolan 6/1/2007 16:07
Approach 177 DOLAN 6/1/2007 10:58
… (20 more records)
Interview 96 DOLAN 6/5/2007 16:38
Interview 167 DOLAN 6/5/2007 14:24
Interview 2 Dolan 6/4/2007 14:54
Interview 2 DOLAN 6/4/2007 11:11
Interview 5 Dolan 6/1/2007 16:07
Interview 27 DOLAN 6/1/2007 14:48
Interview 323 DOLAN 6/1/2007 13:49
Interview 15 Dolan 6/1/2007 11:48
Interview 224 DOLAN 5/31/2007 16:40
Interview 126 Dolan 5/30/2007 15:02
… (20 more records)
…
June 14, 2007 at 10:19 pm
Top by what?
DeltaT?
What if 50% of lines have DeltaT = 1?
What if there are 25 lines? How many to cut off?
_____________
Code for TallyGenerator
June 15, 2007 at 7:58 am
I thought of this on the way to work this morning and realized the way I described what I wanted was incorrect. If the data is sorted by DeltaT for each mile_code and rep combination then in this example (30 records for each group and 10% gone from top to bottom) the first 3 records and last 3 records should be removed. So it doesn't matter what the values of DeltaT are. If there were fractions caused by the number of records chosen and/or the percentage then I'd want it to just round to the nearest whole number.
June 15, 2007 at 12:09 pm
Have a look at SELECT TOP * 10 PERCENT ... ORDER BY and SELECT * TOP 10 PERCENT .... ORDER BY DESC
* Noel
June 15, 2007 at 12:46 pm
June 15, 2007 at 1:15 pm
If it has a key (and I called mine uid2 here) this may help. Not sure how fast it would run on a lot of rows.
select * from testperc t
left join
(select uid2 from (select top 15 percent uid2,deltat from testperc order by deltat desc) a
union
select uid2 from (select top 15 percent uid2,deltat from testperc order by deltat asc) b
) c
on t.uid2=c.uid2
where c.uid2 is null
June 15, 2007 at 5:13 pm
It has a key and I tried this but ended up with some strange results. I should have gotten back 120 records per rep (24 records each for 5 mile_codes) and for some reps I'd get way over that and others way under.
At any rate, I got some help from some other resources and this is what accomplished it in case anyone else has these issues. Requires SQL 2005 and I realize I probably should have posted this on the other forum.
Still have a little more work to do to get the final result before putting it into producton (Have to add some other calculations but will do that by joining in another temp table or something.) Thanks to the help of others this is what I ended up with:
DECLARE @percent int,
@recentindex int
SET @percent = 10
SET @recentindex = 45
Then crate temp table #miledata to get the last 'x' number of records for each rep in each mile_code that they have completed. Do this based off of the @recentindex variable.
Then use either of the following to get results. Haven't decided which I'll put into production yet.
OPTION A
/*This creates a common table expression to run our aggregate query against.*/
;WITH cte_AveRecords AS
(
SELECT row_number() OVER(partition BY mile_code, rep ORDER BY deltaT) AS rowNum, --gives a row number per group, ordered by deltaT
COUNT(*) OVER (partition BY mile_code, rep) AS totalRows, --gets total rows by mile_code and rep.
mile_code,
rep,
deltaT,
recentindex
FROM #miledata
)
/*This deterimines the average per mile_code and rep by selecting the rows that fall above the low percent and less than or equal to the upper percent.*/
SELECT c.mile_code,
c.rep,
CAST(AVG(CAST(a.deltaT AS Decimal(5,1))) AS Decimal(5,1)) AS AvgTime
FROM
(
SELECT mile_code, rep, deltaT, recentindex, rowNum, totalRows
FROM cte_AveRecords
WHERE rowNum > totalRows * (@percent / 100.0) --removes the bottom 'x' percent
AND rowNum < totalRows - (totalRows * (@percent / 100.0)) -- removes the top
)a
JOIN cte_AveRecords c ON a.rep = c.rep AND a.mile_code = c.mile_code
GROUP BY c.mile_code,
c.rep
ORDER BY c.rep,
c.mile_code
OPTION B
/*This creates a common table expression to run our aggregate query against.*/
;WITH cte_AveRecords AS
(
SELECT mile_code,
rep,
deltat,
/*This assigns a percentage to each row number in the mile_code and rep group. See explaination A below*/
((row_number() over(partition by mile_code, rep order by deltat)) * 100.00) / (count(*) over(partition by mile_code, rep)) as pct
FROM #miledata
)
/*This deterimines the average per mile_code and rep based on what we populated the CTE with..*/
SELECT mile_code,
rep,
CAST(AVG(CAST(deltat AS Decimal(5,1))) AS Decimal(5,1)) AS avg_deltat
FROM cte_AveRecords
WHERE pct > @percent and pct < (100.00 - @percent)
GROUP BY mile_code,
rep
ORDER BY rep,
mile_code
GO
June 15, 2007 at 5:26 pm
DELETE T
FROM Table T
INNER JOIN (SELECT TOP 10 PERCENT KeyCol
FROM Table ORDER BY DelteT ASC) T1 ON T.KeyCol = T1.KeyCol
INNER JOIN (SELECT TOP 10 PERCENT KeyCol
FROM Table ORDER BY DelteT DESC) T2 ON T.KeyCol = T2.KeyCol
_____________
Code for TallyGenerator
June 15, 2007 at 7:32 pm
Other queries did top/bottom 10% of the whole table which is why you got such a disparity in rows... Serqiy's does it correctly by the key column...
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2007 at 8:24 am
*If* you could add an Identity column to your data when you import it (ordered properly), you could:
Declare @Lo Int,@Hi Int
Select @Lo=Ident_Current('dbo.Data')*0.10,@Hi=Ident_Current('dbo.Data')-@Lo
Select * from dbo.Data where Ident between Ident_Current('dbo.Data')*0.10 and Ident_Current('dbo.Data')
Adjust things to suit...
June 19, 2007 at 8:25 am
Hmmm, forgot to adjust the code for the variables...
*If* you could add an Identity column to your data when you import it (ordered properly), you could:
Declare @Lo Int,@Hi Int
Select @Lo=Ident_Current('dbo.Data')*0.10,@Hi=Ident_Current('dbo.Data')-@Lo
Select * from dbo.Data where Ident between @Lo and @hi
Adjust things to suit...
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply