March 27, 2009 at 11:53 am
I have a situation where I need to find the top 10 entries per month for the last 5 years. One way to do it is to create 60 different select top 10...order by statements all combined by union all, but I'm guessing there's a more elegant way to do the same thing.
Here is some code to give an idea of what data I'm working with:
drop table #temp;
drop table #nums;
with t1 as (select 1 n union all select 1 n),
t2 as (select 1 n from t1 a, t1 b),
t3 as (select 1 n from t2 a, t2 b),
t4 as (select 1 n from t3 a, t3 b),
t5 as (select 1 n from t4 a, t4 b),
t6 as (select 1 n from t5 a, t5 b),
t7 as (select 1 n from t6 a, t6 b),
t8 as (select top 1000000 ROW_NUMBER() over (order by (select 1)) n from t7)
select CAST(n as int) n
into #nums
from t8
select CAST(n as int) record_id,
Cast(n % 2048 as int) sales_rep_id,
Cast(
Cast(Left(NEWID(),4) as varbinary(8)) as int
) % 384 sales_quantity,
DATEADD(n,0-Cast(Cast(Left(NewID(),4) as varbinary(4)) as int)%2628000,GetDate()) sale_datetime
into #temp
from #nums
--
create clustered index ix_ix on #temp(sale_datetime)
create nonclustered index ix_ix2 on #temp (sales_quantity) include (sales_rep_id)
I want to get the top 10 records (order by sale_quantity desc) by month & year (3/2009, 2/2009, 1/2009, 12/2008, 11/2008, etc).
March 27, 2009 at 12:04 pm
If there are ties, do you want all the people, or just the top 10 rows.
For example, if in December 2008, the top 9 had a sales volume of 500, and then there was a six-place tie for 10th, with 499 each, would you want 10 rows, and too-bad-so-sad for the random 5 who are left off, or do you want 15 rows? What if there's a 5-way tie for 1st place in a month? Do you want 10 rows, or 15?
Also, if a sales_rep_id has more than one entry in a month, should they be summed up, or just take the highest number?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 27, 2009 at 12:12 pm
-- EDIT: I just saw Gsquared's post, did not mean to double answer. My solution does not take into consideration ties or if a rep is there more than once. I can rework the query but it should be a good starting point.
Top 10 sales per month
;WITH SalesGroupNumbered
(SalesYear,SalesMonth,SalesRankInMonth,SalesId,SalesRepId,SalesQuantity,SalesDateTime)
AS
(SELECT
YEAR(sale_datetime)
,MONTH(sale_datetime)
,ROW_NUMBER() over (PARTITION BY YEAR(sale_datetime),MONTH(sale_datetime)
ORDER BY sale_datetime ASC, sales_quantity DESC) GroupRowId
,*
FROM #temp)
SELECT *
FROM SalesGroupNumbered
WHERE SalesRankInMonth < 11
ORDER BY SalesYear,SalesMonth,SalesRankInMonth
March 27, 2009 at 12:18 pm
Yeah, I'm looking at the same type of solution, but wanted to determine if it should be Row_Number, Rank, or Dense_Rank.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 27, 2009 at 12:47 pm
GSquared, I want to return 10 regardless if their are ties. The data set I gave you isn't realistic and has a lot more ties, but for the actual data, there very rarely will be any ties.
We already are using Rank but it is a big table, and proves to be really slow. Selecting by each month (see query below) is actually faster than using Rank(). But you can see from the code why I don't want to use this approach. Compare the IO stats of each of the queries and you'll see why:
Rank():
SET STATISTICS IO ON
SET STATISTICS TIME ON
select * from
(select Row_Number() OVER (partition by DateAdd(m,DateDiff(m,0,sale_datetime),0) ORDER BY sales_quantity desc) [rank],
t1.*
from #temp t1) subtable
where [rank] <= 10
Versus:
select * from
(select top 10 *
from #temp
where sale_datetime >= '3/1/2009' and sale_datetime < '4/1/2009'
order by sales_quantity desc) t1
UNION ALL
select * from
(select top 10 *
from #temp
where sale_datetime >= '2/1/2009' and sale_datetime < '3/1/2009'
order by sales_quantity desc) t2
UNION ALL
select * from
(select top 10 *
from #temp
where sale_datetime >= '1/1/2009' and sale_datetime < '2/1/2009'
order by sales_quantity desc) t3
UNION ALL
select * from
(select top 10 *
from #temp
where sale_datetime >= '12/1/2008' and sale_datetime < '1/1/2009'
order by sales_quantity desc) t4
UNION ALL
select * from
(select top 10 *
from #temp
where sale_datetime >= '11/1/2008' and sale_datetime < '12/1/2008'
order by sales_quantity desc) t5
UNION ALL
select * from
(select top 10 *
from #temp
where sale_datetime >= '10/1/2008' and sale_datetime < '11/1/2008'
order by sales_quantity desc) t6
UNION ALL
select * from
(select top 10 *
from #temp
where sale_datetime >= '9/1/2008' and sale_datetime < '10/1/2008'
order by sales_quantity desc) t7
UNION ALL
select * from
(select top 10 *
from #temp
where sale_datetime >= '8/1/2008' and sale_datetime < '9/1/2008'
order by sales_quantity desc) t8
UNION ALL
select * from
(select top 10 *
from #temp
where sale_datetime >= '7/1/2008' and sale_datetime < '8/1/2008'
order by sales_quantity desc) t9
UNION ALL
select * from
(select top 10 *
from #temp
where sale_datetime >= '6/1/2008' and sale_datetime < '7/1/2008'
order by sales_quantity desc) t10
UNION ALL
select * from
(select top 10 *
from #temp
where sale_datetime >= '5/1/2008' and sale_datetime < '6/1/2008'
order by sales_quantity desc) t11
UNION ALL
select * from
(select top 10 *
from #temp
where sale_datetime >= '4/1/2008' and sale_datetime < '5/1/2008'
order by sales_quantity desc) t12
(I'm not going to union all 60 months, but I think that's enough to show you which one performs better)
March 27, 2009 at 12:54 pm
FYI I just realized I put top 1000 in the title. Please ignore that, it is misleading... I really meant top 10
March 27, 2009 at 1:14 pm
Hi Gabriel
So if the performance of your UNION is well, I would just avoid the copy-pasting:
DECLARE @month DATETIME
SET @month = DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()) + 1, 0)
DECLARE @sql VARCHAR(MAX)
SELECT @sql = (SELECT CASE WHEN N = 1 THEN '' ELSE 'UNION ' END +
' SELECT TOP 10 * ' +
' FROM #temp t' + CONVERT(VARCHAR(2), N) +
' WHERE sale_date >= ''' + CONVERT(VARCHAR(40), DATEADD(MONTH, (N - 1) * -1, @month), 127) + '''' +
' AND sale_date < ''' + CONVERT(VARCHAR(40), DATEADD(MONTH, N * -1, @month), 127) + '''' +
CHAR(10)
FROM Tally
WHERE N <= 60 FOR XML PATH(''))
SELECT @sql = REPLACE(REPLACE(@sql, '<', ' ')
PRINT @sql
March 27, 2009 at 1:17 pm
I forgot to mention another thing, dynamic sql is strong opposed in this environment :angry:
March 27, 2009 at 2:04 pm
If hard-coding the date ranges in there works, then yes, that definitely outperforms the Rank/Row_Number version.
An alternative to hard-coding it would be something like this:
declare @sql varchar(max);
select @sql = coalesce(@SQL + 'select * from (select top 10 *
from #temp
where sale_datetime >= ''' + cast(dateadd(month, number, '3/1/2004') as varchar(100)) + ''' and sale_datetime < ''' + cast(dateadd(month, number+1, '3/1/2004') as varchar(100)) + '''
order by sales_quantity desc) X union all
',
'select * from (select top 10 *
from #temp
where sale_datetime >= ''' + cast(dateadd(month, number, '3/1/2004') as varchar(100)) + ''' and sale_datetime < ''' + cast(dateadd(month, number+1, '3/1/2004') as varchar(100)) + '''
order by sales_quantity desc) X union all
', @sql)
from dbo.Numbers
where number between 0 and 60;
select @sql = reverse(stuff(reverse(@SQL), 1, 12, ';')); -- gets rid of final Union All
print @sql; -- For debugging
exec (@SQL);
With that, it'll build all the Union All queries for you. You can also set the date range by using an input parameter instead of the date I hard-coded into it.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 27, 2009 at 2:06 pm
Ah. Didn't see the prohibition against dynamic SQL. In that case, even though it performs slightly worse, I'd go with the Row_Number version, because otherwise you're going to have to do something really screwy to build your query each time it needs to be used.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 27, 2009 at 3:16 pm
I figured it out in case anyone is interested.
declare @first_month datetime, @last_month datetime
select @first_month = DATEADD(m,DateDiff(m,0,Min(sale_datetime)),0),
@last_month = DATEADD(m,DateDiff(m,0,Max(sale_datetime)),0)
from #temp
select n.n, t1.*
from #nums n
cross apply (select top 10 *
from #temp
where sale_datetime >= DATEADD(m,n.n-1,@first_month)
and sale_datetime < DATEADD(m,n.n,@first_month)
order by sales_quantity desc) t1
where n.n <= DATEDIFF(m,@first_month,@last_month)
March 27, 2009 at 4:30 pm
Nice solution, Gabriel. Thanks for posting it up.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
March 28, 2009 at 9:44 pm
Gabriel P (3/27/2009)
I figured it out in case anyone is interested.
declare @first_month datetime, @last_month datetime
select @first_month = DATEADD(m,DateDiff(m,0,Min(sale_datetime)),0),
@last_month = DATEADD(m,DateDiff(m,0,Max(sale_datetime)),0)
from #temp
select n.n, t1.*
from #nums n
cross apply (select top 10 *
from #temp
where sale_datetime >= DATEADD(m,n.n-1,@first_month)
and sale_datetime < DATEADD(m,n.n,@first_month)
order by sales_quantity desc) t1
where n.n <= DATEDIFF(m,@first_month,@last_month)
With your test data and the code above, that returns 590 rows... if you want 5 years worth, that's 60 months and multiply that by 10 entries, you should be getting 600 rows returned. I'm thinking something isn't quite right... and, yes, I'm taking a look to see what.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 28, 2009 at 10:45 pm
Just add +1 to the following and you'll get your 60 months...
where n.n <= DATEDIFF(m,@first_month,@last_month)[font="Arial Black"]+1[/font]
--Jeff Moden
Change is inevitable... Change for the better is not.
March 30, 2009 at 5:49 pm
I wouldn't use a numbers table on this type of query. I'd rather use a properly indexed calendar type table. I've found that they are very useful for many applications and you don't have to use any sort of calculated dates in a WHERE clause, thus avoiding not being able to use any indexes on date columns.
Here's an example of a simple month type calendar table going from 1/1/2000 to 300 months later:
CREATE TABLE CalMonth
( MonthID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED
, MonthStart DATETIME
, NextMonth DATETIME
, MonthDescr CHAR(6)
)
CREATE INDEX CalMonth_Dates ON CalMonth (MonthStart, NextMonth)
-- Populate the calendar table.
;WITH TallyCTE AS (
SELECT 0 AS MonthAdd
UNION ALL
SELECT TOP 300 ROW_NUMBER() OVER (ORDER BY name)
FROM sys.syscolumns
)
INSERT INTO CalMonth
( MonthStart, NextMonth )
SELECT
DATEADD(month, T.MonthAdd, '1/1/2000')
, DATEADD(Month, T.MonthAdd, '2/1/2000')
FROM TallyCTE T
-- Fill in the month descriptions
UPDATE CalMonth
SET MonthDescr = LEFT(DATENAME(month, MonthStart), 3)
+ '-'
+ RIGHT(CONVERT(VARCHAR, DATEPART(year, MonthStart)), 2)
Here's your query using the current month and 59 months prior to that. There's no date calculations at all:
DECLARE
@FirstMonthINT
, @LastMonthINT
SELECT @LastMonth = MonthID FROM CalMonth
WHERE GETDATE() >= MonthStart
AND GETDATE() < NextMonth
SET @FirstMonth = @LastMonth - 59
SELECT CM.MonthID, CM.MonthDescr AS SalesMonth
, T1.*
FROM CalMonth CM
CROSS APPLY
(SELECT TOP 10 * FROM #temp
WHERE sale_datetime >= CM.MonthStart
AND sale_datetime < CM.NextMonth
ORDER BY #temp.sales_quantity DESC
) AS T1
WHERE
CM.MonthID BETWEEN @FirstMonth And @LastMonth
Put your own ORDER BY at the bottom.
Todd Fifield
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply