Returning the Top X row for each group

Question

Returning the Top X row for each group

Viewing 15 posts - 16 through 30 (of 44 total)

You must be logged in to reply to this topic. Login to reply

swoveland21 SSC Journeyman Points: 84 More actions · Answer 1

Thanks much for the info. I tried both versions on production data and found the row_number to be much faster, which took a second to run compared to 14 seconds to run using cross apply.

Account table has 121,000 records and call has 651,000 records. Due to account type clause there are only 8,000 acount and 98,000 call records affected. I wonder why the results are so different, acctid is primary key and index does exist on callactid.

14 seconds

select acctservareaid, acctid, acctname, acctstorenumber, callconfirmation, callaccttypeid, work_order

from account with (nolock)

cross apply (

select top 1 calltime, callconfirmation, callaccttypeid, callcokeagent + callpo work_order

from call with (nolock)

where callacctid = acctid

and callaccttypeid = 'c'

order by calltime desc) as call

where accttypeid = 'c'

1 second

with cteCall

as

(

select acctservareaid, acctid, acctname, acctstorenumber, calltime, callconfirmation, callaccttypeid, callcokeagent + callpo work_order

,row_number() over (partition by AcctID order by CallTime desc ) as RowN

from call with (nolock)

inner join account with (nolock) on callacctid = acctid

and callaccttypeid = 'c'

and accttypeid = 'c'

)

Select * from cteCall

where RowN <=1

order by AcctID, CallTime Desc

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 2

Idea Deadbeat (12/6/2010)
Not directly related to article - but - an alternative to spt_values:
create table #Number (number int);
with N4000 as (select 0 as Number union all select Number+1 from N4000 where Number <4000
)insert into #number select * from N4000 option (MAXRECURSION 4000);
create index ix_N on #Number (Number);

I know you're only building it once but I'd never use the hidden RBAR of a recursive CTE to build an on-the-fly Tally Table for no other reason than someone may copy the code to do something larger and really end up with a performance problem.

I'd also use a clustered index on the table with a FILLFACTOR = 100 just in case someone wants to use the code on a server where the default FILLFACTOR has been changed to something other than 0 (100).

Here's what I mean... the following code includes several different methods including the original recursive CTE. Yeah... I even threw in a WHILE loop because it's a big surprise for a lot of folks when compared to a recursive CTE: (Didn't include index builds because they're all the same)

--===== Drop Tables ===========================================================================

DROP TABLE #Number, #Number1, #Number2, #Tally, #Tally1;

SET NOCOUNT ON;

DBCC FREEPROCCACHE;

WAITFOR DELAY '00:00:05'; --Let the system "settle" for several seconds

GO

--===== Recursive CTE w/Predefined Table (original)

create table #Number (number int);

with N4000 as (select 1 as Number union all select Number+1 from N4000 where Number <4000

)insert into #number select * from N4000 option (MAXRECURSION 4000);

GO

--===== Recursive CTE w/SELECT/INTO

with N4000 as (select ISNULL(1,0) as Number union all select Number+1 from N4000 where Number <4000

)select * INTO #Number1 from N4000 option (MAXRECURSION 4000);

GO

--===== WHILE Loop

CREATE TABLE #Number2 (Number INT NOT NULL);

DECLARE @Counter INT;

SELECT @Counter = 1;

BEGIN TRANSACTION;

WHILE @Counter <= 4000

BEGIN

INSERT INTO #Number2

(Number)

SELECT @Counter;

set @Counter = @Counter + 1;

END;

COMMIT;

GO

--===== Cross Join

SELECT TOP 4000

N = IDENTITY(INT,1,1)

INTO #Tally

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

;

GO

--===== Modified Itzek Method

WITH

E1(N) AS ( --=== Create Ten 1's

SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 --10

),

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4)

SELECT N = ISNULL(N,0)

INTO #Tally1

FROM cteTally

WHERE N <= 4000

;

And, here are the results from SQL Profiler. Even the WHILE loop beat the recursive CTE! 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 3

swoveland21 (12/6/2010)
Account table has 121,000 records and call has 651,000 records. Due to account type clause there are only 8,000 acount and 98,000 call records affected. I wonder why the results are so different, acctid is primary key and index does exist on callactid.

Without seeing the plan its hard to say for definite , but as you require a large portion of the data (651K / 121 K = 20% roughly assuming that each account has 1 call ) , then the single scan through the call table is more efficient that the seeks to find the first call. In my example we were required to find a tiny portion on the data 1mill / 100 = 0.01% .

Clear Sky SQL
My Blog[/url]

Tim Widdup Grasshopper Points: 11 More actions · Answer 4

Ok , so how could you achieve this in SQL Server 2000? Again, just need the top x from each group ordered by the ID field.

Thanks

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 5

Tim Widdup (12/9/2010)
Ok , so how could you achieve this in SQL Server 2000? Again, just need the top x from each group ordered by the ID field.
Thanks

You will have to loop or cursor through the data. A "quirky update" may be possible, cant remember if possible in 2000 , sorry.

Clear Sky SQL
My Blog[/url]

david.soussan SSC Rookie Points: 43 More actions · Answer 6

Dave Ballantyne (12/9/2010)
Tim Widdup (12/9/2010)
Ok , so how could you achieve this in SQL Server 2000? Again, just need the top x from each group ordered by the ID field.
Thanks
You will have to loop or cursor through the data. A "quirky update" may be possible, cant remember if possible in 2000 , sorry.

Or you can do it with a self-join: http://davidsoussan.co.uk/2009/10/10/how-to-sequence-entries-in-each-sub-set/

SELECT t1.userid, COUNT(t1.tableid) AS sequence, t1.tableid, t1.tableid >= t2.tableid AS flg

FROM table t1 INNER JOIN table t2 ON t1.userid = t2.userid

GROUP BY t1.userid, t1.tableid, flg

HAVING flg = TRUE

That method is platform-agnostic so it will work on SQL2000, Access, MySQL, etc. I'd be interested to hear how it compares in performance to the platform-specific row_number(), it runs very efficiently on MySQL and even on Access/JET provided that the join column is properly indexed. You can use any column for the join, or even multiple columns with a multi-colmun index, just so long as it results in a unique index within each subset. Add your condition using the count/sequence number to return the x number of records for each group. My blog shows how to vary the condition and one or two usefull variations and applications of the method.

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 7

Hi David ,

Pretty poorly. Your method also performs poorly in relation to a traditional sub-select.

This may or may not be platform agnostic, although an important point its not a concern for me personally. BTW your code does need mods to run in a SQLServer environment 😉

See the attached image for a screen shot from profiler

drop table #testtab

go

create table #testtab

(

userid integer,

tableid integer

)

go

create unique clustered index idx1 on #testtab(userid,tableid)

go

with cter

as

(

select top(1000) row_number() over (order by (select null)) as r

from syscolumns a cross join syscolumns b cross join syscolumns c

)

insert into #testtab

Select r1.r,r2.r

from cter r1 cross join cter r2

select @@rowcount -- 1,000,000 rows

go

-- David Soussan Method

SELECT t1.userid, COUNT(t1.tableid) AS sequence, t1.tableid, case when t1.tableid >= t2.tableid then 1 else 0 end AS flg

FROM #testtab t1 INNER JOIN #testtab t2 ON t1.userid = t2.userid

GROUP BY t1.userid, t1.tableid, case when t1.tableid >= t2.tableid then 1 else 0 end

HAVING case when t1.tableid >= t2.tableid then 1 else 0 end = 1

go

-- count(*) sub select

select *,(Select count(*)

from #testtab t2

where t1.userid = t2.userid

and t2.TableId <= t1.tableId)

from #testtab t1

go

-- 2005 + , row_number()

Select * ,row_number() over (partition by userid order by tableid)

from #testtab

Clear Sky SQL
My Blog[/url]

david.soussan SSC Rookie Points: 43 More actions · Answer 8

Well Dave, I was hoping for some real benchmark data, but thanks anyway.

Of course the sql needs to be changed to the dialect specific to the platform - that is always a given. I'm not sure why you have to use CASE on sql server, but then I am a sql server newbie (not an sql newbie just a sql server newbie). Perhaps that is what makes it soooo slow and the syntax so ugly. On Access you have to use the HAVING clause for this to work but the boolean calculation is only a very small performance hit. On MySQL v 4 other problems arise, like the lack of table-type subquery support. v5 on the other hand would allow the sub-query select so that is just fine.

As I said, if all else fails then my method will work on any platform, just adjust the sql to the local dialect. If the platform has other tools then use them.

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 9

david.soussan (12/10/2010)
Well Dave, I was hoping for some real benchmark data, but thanks anyway.

Setup SQL Profiler and you'll have it. 😉

So far as true portability across platforms goes, it's a myth.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 10

BTW... the problem with the first two queries (after the test data setup) is that they both have full blown Triangular Joins in them.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 11

Jeff Moden (12/12/2010)
Setup SQL Profiler and you'll have it.;-)

Dave Ballantyne (12/10/2010)
See the attached image for a screen shot from profiler

😉

Clear Sky SQL
My Blog[/url]

david.soussan SSC Rookie Points: 43 More actions · Answer 12

Dave, whoops, looks like we both missed the attachment, thanks.

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 13

Dave Ballantyne (12/12/2010)
Jeff Moden (12/12/2010)
Setup SQL Profiler and you'll have it.;-)
Dave Ballantyne (12/10/2010)
See the attached image for a screen shot from profiler
😉

Yep... I definitely missed the attachment.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Thinknook SSCrazy Points: 2531 More actions · Answer 14

Madhivanan-208264 (12/6/2010)
More methods are available here
http://beyondrelational.com/blogs/madhivanan/archive/2008/09/12/return-top-n-rows.aspx

Awesome article Dave, and a great follow up (and hollistic) resource Old Hand. Thanks for sharing guys!

-----------------
... Then again, I could be totally wrong! Check the answer.
Check out posting guidelines here for faster more precise answers[/url].

I believe in Codd
... and Thinknook is my Chamber of Understanding

ganotedp SSC-Addicted Points: 492 More actions · Answer 15

ganotedp

SSC-Addicted

Points: 492

November 23, 2012 at 5:32 am

#1561845

Why not DENSE_RANK instead of ROW_NUMBER?