Percentages performance

Question

Post reply

Percentages performance

Jules Bonnot

SSCertifiable

Points: 5488
More actions
May 5, 2009 at 7:20 am

#204277

Anyone know a more efficient way of doing this. Obviously my code scans the table three times. I imagine there is a more elegant way of doing this.
----create test data
SELECT TOP 1000000
IDENTITY(INT,1,1) AS N
INTO #numbers
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--- gets 75% and 25% values using parameterised top clauses
declare @max-2 int , @min-2 int
set @max-2 = 25
set @min-2 = 25
select min(n.n), max(n.n)
from #numbers n
left join
(
select n
from (
select top (@max) percent n from #numbers order by n desc
)x
union
select *
from (
select top (@min) percent n from #numbers order by n asc
)x
)y
on n.n = y.n
where y.n is null
drop table #numbers
thanks for all help
www.sql-library.com[/url]

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

I just tried this:

set nocount on;

if object_id(N'tempdb..#T') is not null

drop table #T;

if object_id(N'tempdb..#T2') is not null

drop table #T2;

create table #T (

ID int identity primary key,

Val int);

insert into #T (Val)

select abs(checksum(newid()))

from dbo.Numbers;

declare @max-2 int, @min-2 int;

select @max-2 = 25, @min-2 = 25;

set statistics time on;

declare @Row1 int, @Row2 int, @Rows float;

select @Rows = count(*)

from #T;

select

@Row1 = @Rows / (100.00/@Max) + 1,

@Row2 = @Rows - @Rows / (100.00/@Min);

select row_number() over (order by Val) as Row, Val

into #T2

from #T;

select T21.Val as BottomVal, T22.Val as TopVal

from #T2 T21, #T2 T22

where T21.Row = @Row1

and T22.Row = @Row2;

set statistics time off;

print '======================================='

set statistics time on;

select min(n.val), max(n.val)

from #T n

left join

(

select n

from (

select top (@max) percent val as n from #T order by val desc

)x

union

select *

from (

select top (@min) percent val as n from #T order by val asc

)x

)y

on n.val = y.n

where y.n is null;

set statistics time off;

The first solution typically took about half the time of the second one on my machine.

Also tried it with IO stats turned on. First solution takes 2 scans, 48 reads on the main table, and 2 scans 54 reads on the secondary table. Second solution takes 3 scans, 72 reads total.

Expanded it out to 1-million rows (from 10-thousand). At that size, the first solution took 2672 milliseconds CPU time, 2335 total. Second solution took 12829 milliseconds CPU, 19636 total. First one took 8 scans, 9416 reads, second one took 9 scans, 6327 reads.

So, if you have a lot of rows, the first solution is probably going to be significantly faster, but the second one has slightly better IO stats (1 more scan, but only 2/3rds of the reads).

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon