December 4, 2010 at 1:06 pm
Comments posted to this topic are about the item Returning the Top X row for each group
December 6, 2010 at 1:22 am
A nice quest! I like the solution for its clever use of the index.
However, you depend on a table to deliver the lookup-values for the age-column (in this case master..spt_values ) and you need pre-execution knowledge about all the possible ages in your table (in this case 0...100).
Here's a suggestion:
select * from
(select distinct age as AgeGroup from #RunnersBig) S1
cross apply
(select top 2 * from #RunnersBig where age=agegroup order by time)S2
I suspect it is not as fast as your solution, as it scans the index for the "distinct ages". But it would not require any knowledge about the values in the "ages" column.
I wonder if there is a solution having both, a fast execution with index-seek on the one hand, and a general independence of the value distribution in the age-column.
Kay
December 6, 2010 at 2:29 am
More methods are available here
http://beyondrelational.com/blogs/madhivanan/archive/2008/09/12/return-top-n-rows.aspx
Failing to plan is Planning to fail
December 6, 2010 at 3:29 am
Kay ,
You are absolutely correct , any solution that relies upon a tally table has to have at least the required number of rows in the said tally table.
If you used a tally table to build all the dates for the next 10 years, you need to ensure that you have at least 3655 (ish) rows.
An alternative to your distinct method would be to grab the max(age) , which will involve reading a single row from the index. Something like this...
with cteTally
as
(
select number from master..spt_values where type = 'P' and number >0 and number <=(Select max(age) from #RunnersBig)
)
select *
from cteTally
cross apply
(
select top 2 * from #RunnersBig where age=number order by time
) as Winners
December 6, 2010 at 4:59 am
would this exact code work using TOAD? Or will I get errors. Some keywords dont work using TOAD. Also, how would u sum() up the combined selected top row grouping.
December 6, 2010 at 6:53 am
Great job and this is a reference I'll keep around.
December 6, 2010 at 7:49 am
Thank you for the article, very useful.
Is it safe to say that it only applies to SQL Server 2005 & 2008 but not to 2000?
December 6, 2010 at 7:56 am
I got this error message:
Msg 4108, Level 15, State 1, Line 3
Windowed functions can only appear in the SELECT or ORDER BY clauses.
when running the following codes:
select * ,row_number() over (partition by Age order by Time ) as RowN
from #Runners
where row_number() over (partition by Age order by Time ) <=2
order by Age,Rown
From what I can understand, row_number() cannot be used in where clause?
December 6, 2010 at 8:02 am
@Mihai , Yes this is all 2005 (or greater)
That code was qualified with "In an ideal world we would be able to execute ..... However we cannot, so the currently suggested....."
So the error is expected.
December 6, 2010 at 8:06 am
Not directly related to article - but - an alternative to spt_values:
create table #Number (number int);
with N4000 as (select 0 as Number union all select Number+1 from N4000 where Number <4000
)insert into #number select * from N4000 option (MAXRECURSION 4000);
create index ix_N on #Number (Number);
with cteTally
as
(
select Number from #Number where number >0 and number <=(Select max(age) from #RunnersBig)
)
select *
from cteTally
cross apply
(
select top 2 * from #RunnersBig where age=number order by time
) as Winners
December 6, 2010 at 9:05 am
Dave,
I liked this article, thanks!
One more thing: this optimization relies on the assumptions that only a small percentage are winners. It might be interesting to research which query is faster if everyone is a winner, and what percentage of winners is the tipping point when both queries run for the same time.
What do you think?
December 6, 2010 at 9:13 am
Thanks, Dave!! My bad - I didn't read it carefully enough!
December 6, 2010 at 9:27 am
@alex , Thanks glad you liked it.
The same thought had occurred to me but there complication would be that it would not entirely reproducible due to random data.
Maybe a follow up with published data is in order then after thats all proved decided and measured , i could break it all by adding another included column of dummy data
@rockvilleaustin , No worries
December 6, 2010 at 2:00 pm
...what about a tie?
INSERT INTO #Runners SELECT 9,10,20
December 6, 2010 at 2:08 pm
Thanks for the article Dave. It is a well explained example and easy to follow.
Cheers,
Nicole Bowman
Nothing is forever.
Viewing 15 posts - 1 through 15 (of 44 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy