December 6, 2010 at 2:19 pm
Thanks much for the info. I tried both versions on production data and found the row_number to be much faster, which took a second to run compared to 14 seconds to run using cross apply.
Account table has 121,000 records and call has 651,000 records. Due to account type clause there are only 8,000 acount and 98,000 call records affected. I wonder why the results are so different, acctid is primary key and index does exist on callactid.
14 seconds
select acctservareaid, acctid, acctname, acctstorenumber, callconfirmation, callaccttypeid, work_order
from account with (nolock)
cross apply (
select top 1 calltime, callconfirmation, callaccttypeid, callcokeagent + callpo work_order
from call with (nolock)
where callacctid = acctid
and callaccttypeid = 'c'
order by calltime desc) as call
where accttypeid = 'c'
1 second
with cteCall
as
(
select acctservareaid, acctid, acctname, acctstorenumber, calltime, callconfirmation, callaccttypeid, callcokeagent + callpo work_order
,row_number() over (partition by AcctID order by CallTime desc ) as RowN
from call with (nolock)
inner join account with (nolock) on callacctid = acctid
and callaccttypeid = 'c'
and accttypeid = 'c'
)
Select * from cteCall
where RowN <=1
order by AcctID, CallTime Desc
December 6, 2010 at 6:36 pm
Idea Deadbeat (12/6/2010)
Not directly related to article - but - an alternative to spt_values:create table #Number (number int);
with N4000 as (select 0 as Number union all select Number+1 from N4000 where Number <4000
)insert into #number select * from N4000 option (MAXRECURSION 4000);
create index ix_N on #Number (Number);
I know you're only building it once but I'd never use the hidden RBAR of a recursive CTE to build an on-the-fly Tally Table for no other reason than someone may copy the code to do something larger and really end up with a performance problem.
I'd also use a clustered index on the table with a FILLFACTOR = 100 just in case someone wants to use the code on a server where the default FILLFACTOR has been changed to something other than 0 (100).
Here's what I mean... the following code includes several different methods including the original recursive CTE. Yeah... I even threw in a WHILE loop because it's a big surprise for a lot of folks when compared to a recursive CTE: (Didn't include index builds because they're all the same)
--===== Drop Tables ===========================================================================
DROP TABLE #Number, #Number1, #Number2, #Tally, #Tally1;
SET NOCOUNT ON;
DBCC FREEPROCCACHE;
WAITFOR DELAY '00:00:05'; --Let the system "settle" for several seconds
GO
--===== Recursive CTE w/Predefined Table (original)
create table #Number (number int);
with N4000 as (select 1 as Number union all select Number+1 from N4000 where Number <4000
)insert into #number select * from N4000 option (MAXRECURSION 4000);
GO
--===== Recursive CTE w/SELECT/INTO
with N4000 as (select ISNULL(1,0) as Number union all select Number+1 from N4000 where Number <4000
)select * INTO #Number1 from N4000 option (MAXRECURSION 4000);
GO
--===== WHILE Loop
CREATE TABLE #Number2 (Number INT NOT NULL);
DECLARE @Counter INT;
SELECT @Counter = 1;
BEGIN TRANSACTION;
WHILE @Counter <= 4000
BEGIN
INSERT INTO #Number2
(Number)
SELECT @Counter;
set @Counter = @Counter + 1;
END;
COMMIT;
GO
--===== Cross Join
SELECT TOP 4000
N = IDENTITY(INT,1,1)
INTO #Tally
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
GO
--===== Modified Itzek Method
WITH
E1(N) AS ( --=== Create Ten 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 --10
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4)
SELECT N = ISNULL(N,0)
INTO #Tally1
FROM cteTally
WHERE N <= 4000
;
And, here are the results from SQL Profiler. Even the WHILE loop beat the recursive CTE! 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2010 at 2:35 am
swoveland21 (12/6/2010)
Account table has 121,000 records and call has 651,000 records. Due to account type clause there are only 8,000 acount and 98,000 call records affected. I wonder why the results are so different, acctid is primary key and index does exist on callactid.
Without seeing the plan its hard to say for definite , but as you require a large portion of the data (651K / 121 K = 20% roughly assuming that each account has 1 call ) , then the single scan through the call table is more efficient that the seeks to find the first call. In my example we were required to find a tiny portion on the data 1mill / 100 = 0.01% .
December 9, 2010 at 2:52 am
Ok , so how could you achieve this in SQL Server 2000? Again, just need the top x from each group ordered by the ID field.
Thanks
December 9, 2010 at 5:33 am
Tim Widdup (12/9/2010)
Ok , so how could you achieve this in SQL Server 2000? Again, just need the top x from each group ordered by the ID field.Thanks
You will have to loop or cursor through the data. A "quirky update" may be possible, cant remember if possible in 2000 , sorry.
December 10, 2010 at 2:21 am
Dave Ballantyne (12/9/2010)
Tim Widdup (12/9/2010)
Ok , so how could you achieve this in SQL Server 2000? Again, just need the top x from each group ordered by the ID field.Thanks
You will have to loop or cursor through the data. A "quirky update" may be possible, cant remember if possible in 2000 , sorry.
Or you can do it with a self-join: http://davidsoussan.co.uk/2009/10/10/how-to-sequence-entries-in-each-sub-set/
SELECT t1.userid, COUNT(t1.tableid) AS sequence, t1.tableid, t1.tableid >= t2.tableid AS flg
FROM table t1 INNER JOIN table t2 ON t1.userid = t2.userid
GROUP BY t1.userid, t1.tableid, flg
HAVING flg = TRUE
That method is platform-agnostic so it will work on SQL2000, Access, MySQL, etc. I'd be interested to hear how it compares in performance to the platform-specific row_number(), it runs very efficiently on MySQL and even on Access/JET provided that the join column is properly indexed. You can use any column for the join, or even multiple columns with a multi-colmun index, just so long as it results in a unique index within each subset. Add your condition using the count/sequence number to return the x number of records for each group. My blog shows how to vary the condition and one or two usefull variations and applications of the method.
December 10, 2010 at 3:39 am
Hi David ,
Pretty poorly. Your method also performs poorly in relation to a traditional sub-select.
This may or may not be platform agnostic, although an important point its not a concern for me personally. BTW your code does need mods to run in a SQLServer environment 😉
See the attached image for a screen shot from profiler
drop table #testtab
go
create table #testtab
(
userid integer,
tableid integer
)
go
create unique clustered index idx1 on #testtab(userid,tableid)
go
with cter
as
(
select top(1000) row_number() over (order by (select null)) as r
from syscolumns a cross join syscolumns b cross join syscolumns c
)
insert into #testtab
Select r1.r,r2.r
from cter r1 cross join cter r2
select @@rowcount -- 1,000,000 rows
go
-- David Soussan Method
SELECT t1.userid, COUNT(t1.tableid) AS sequence, t1.tableid, case when t1.tableid >= t2.tableid then 1 else 0 end AS flg
FROM #testtab t1 INNER JOIN #testtab t2 ON t1.userid = t2.userid
GROUP BY t1.userid, t1.tableid, case when t1.tableid >= t2.tableid then 1 else 0 end
HAVING case when t1.tableid >= t2.tableid then 1 else 0 end = 1
go
-- count(*) sub select
select *,(Select count(*)
from #testtab t2
where t1.userid = t2.userid
and t2.TableId <= t1.tableId)
from #testtab t1
go
-- 2005 + , row_number()
Select * ,row_number() over (partition by userid order by tableid)
from #testtab
December 10, 2010 at 4:39 am
Well Dave, I was hoping for some real benchmark data, but thanks anyway.
Of course the sql needs to be changed to the dialect specific to the platform - that is always a given. I'm not sure why you have to use CASE on sql server, but then I am a sql server newbie (not an sql newbie just a sql server newbie). Perhaps that is what makes it soooo slow and the syntax so ugly. On Access you have to use the HAVING clause for this to work but the boolean calculation is only a very small performance hit. On MySQL v 4 other problems arise, like the lack of table-type subquery support. v5 on the other hand would allow the sub-query select so that is just fine.
As I said, if all else fails then my method will work on any platform, just adjust the sql to the local dialect. If the platform has other tools then use them.
December 12, 2010 at 7:45 pm
david.soussan (12/10/2010)
Well Dave, I was hoping for some real benchmark data, but thanks anyway.
Setup SQL Profiler and you'll have it. 😉
So far as true portability across platforms goes, it's a myth.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 12, 2010 at 8:10 pm
BTW... the problem with the first two queries (after the test data setup) is that they both have full blown Triangular Joins in them.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 12, 2010 at 11:43 pm
Jeff Moden (12/12/2010)
Setup SQL Profiler and you'll have it.;-)
Dave Ballantyne (12/10/2010)
See the attached image for a screen shot from profiler
😉
December 13, 2010 at 6:50 am
Dave, whoops, looks like we both missed the attachment, thanks.
December 31, 2010 at 5:42 pm
Dave Ballantyne (12/12/2010)
Jeff Moden (12/12/2010)
Setup SQL Profiler and you'll have it.;-)Dave Ballantyne (12/10/2010)
See the attached image for a screen shot from profiler😉
Yep... I definitely missed the attachment.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 23, 2012 at 3:25 am
Madhivanan-208264 (12/6/2010)
More methods are available herehttp://beyondrelational.com/blogs/madhivanan/archive/2008/09/12/return-top-n-rows.aspx
Awesome article Dave, and a great follow up (and hollistic) resource Old Hand. Thanks for sharing guys!
-----------------
... Then again, I could be totally wrong! Check the answer.
Check out posting guidelines here for faster more precise answers[/url].
I believe in Codd
... and Thinknook is my Chamber of Understanding
November 23, 2012 at 5:32 am
Why not DENSE_RANK instead of ROW_NUMBER?
Viewing 15 posts - 16 through 30 (of 44 total)
You must be logged in to reply to this topic. Login to reply