Revenue per month report — cross tab with tally

Question

Revenue per month report — cross tab with tally

Allister Reid

SSCrazy

Points: 2666
More actions
June 9, 2010 at 7:28 am

#387107

Hi,
I have been tasked with writing a query that generates a result set for use in a web application.
The result set is a side by side comparison (month on month) of the last two years of transactions for a hotel group.
There are close to 2 million stay (transaction) records in the live data, and the query I have written takes 10 seconds to run.
I had hoped to do better than this so started creating the ddls and a query to populate a table to post here, and when I ran the report query execution completed in 25% time of the live data. Fragmentation on the live data index is minimal at 0.01%. Please find execution plan of query on live data attached.
Ok, here's the ddl and script for populating the test table:
NOTE: THIS SCRIPT CREATES A TEMPORARY TABLE WITH 2 MILLION ROWS
USE tempdb;
IF OBJECT_ID('tempdb..#Stay') IS NOT NULL
DROP TABLE #Stay;
CREATE TABLE #Stay(
[StayID] [uniqueidentifier] NOT NULL,
[GuestID] [uniqueidentifier] NOT NULL,
[ArrivalDate] [datetime] NOT NULL,
[TotalRevenue] [money] NOT NULL
CONSTRAINT [PK_Stay] PRIMARY KEY NONCLUSTERED
(
[StayID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
ALTER TABLE #Stay ADD CONSTRAINT [DF_Stay_StayID] DEFAULT (newsequentialid()) FOR [StayID];
CREATE CLUSTERED INDEX [IX_Stay_ArrivalDateGuestID] ON #Stay
(
[ArrivalDate] ASC,
[GuestID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY];
--Populate with one million rows of random data, thanks to Jeff for this (http://www.sqlservercentral.com/articles/Crosstab/65048/)
INSERT INTO #Stay (GuestID, ArrivalDate, TotalRevenue)
SELECT TOP 2000000
NEWID(),
CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY)
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
And here is the report query that I want to improve performance on:
USE tempdb
DECLARE @StartYear1 DATETIME;
DECLARE @StartYear2 DATETIME;
SET @StartYear2 = DATEADD(YY, -1, DATEADD(MM, DATEDIFF(MM, 0, GETDATE()), 0));
SET @StartYear1 = DATEADD(YY, -1, @StartYear2);
SELECT
DATEADD(MM, N - 1, @StartYear1) Year1,
SUM(CASE WHEN ArrivalDate >= @StartYear1 AND ArrivalDate < @StartYear2 THEN TotalRevenue ELSE 0 END) Year1Revenue,
DATEADD(MM, N -1, @StartYear2) Year2,
SUM(CASE WHEN ArrivalDate >= @StartYear2 THEN TotalRevenue ELSE 0 END) Year2Revenue
FROM Master.dbo.Tally
LEFT JOIN #Stay
ON
DATEADD(MM, N - 1, @StartYear1) = DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0)
OR DATEADD(MM, N - 1, @StartYear2) = DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0)
WHERE ArrivalDate >= @StartYear1 AND N <= 12
GROUP BY N;
Hopefully someone can help?
Thanks in advance 🙂

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

WayneS SSC Guru Points: 95460 More actions · Answer 1

The first thing I see that I'm confused by is that your query is actually doing a clustered index seek... with the DateAdd and DateDiff functions being used around the ArrivalDate, I did not expect that.

Question: does the ArrivalDate contain times, or is it just a date?

If yes, you might want to recode the join condition to be something like:

(

ArrivalDate >= DATEADD(MM, N - 1, @StartYear1) AND

ArrivalDate < DateAdd(day, 1, DATEADD(MM, N - 1, @StartYear1))

)

OR

(

ArrivalDate >= DATEADD(MM, N - 1, @StartYear2) AND

ArrivalDate < DateAdd(day, 1, DATEADD(MM, N - 1, @StartYear2))

)

If not, take out the DateAdd/DateDiff from around that column.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Allister Reid SSCrazy Points: 2666 More actions · Answer 2

Thanks for the reply Wayne,

Arrival date has no time associated with it on live data.

However, not sure what you mean with regards the join?

Checking the report query with my join condition, with the query below, gives me the correct answer on both live and dummy.

SELECT SUM(TotalRevenue)

FROM #Stay

WHERE

ArrivalDate>='2008-06-01 00:00:00.000'

AND ArrivalDate<'2008-07-01 00:00:00.000';

Allister Reid SSCrazy Points: 2666 More actions · Answer 3

Ah, i see where the confusion was arising, you added a day, instead of a month in the less than part of each join condition:

So report query with your join condition would be:

SELECT

DATEADD(MM, N - 1, @StartYear1) Year1,

SUM(CASE WHEN ArrivalDate >= @StartYear1 AND ArrivalDate < @StartYear2 THEN TotalRevenue ELSE 0 END) Year1Revenue,

DATEADD(MM, N -1, @StartYear2) Year2,

SUM(CASE WHEN ArrivalDate >= @StartYear2 THEN TotalRevenue ELSE 0 END) Year2Revenue

FROM Master.dbo.Tally

LEFT JOIN #Stay

ON

(

ArrivalDate >= DATEADD(MM, N - 1, @StartYear1) AND

ArrivalDate < DateAdd(mm, 1, DATEADD(MM, N - 1, @StartYear1))

)

OR

(

ArrivalDate >= DATEADD(MM, N - 1, @StartYear2) AND

ArrivalDate < DateAdd(mm, 1, DATEADD(MM, N - 1, @StartYear2))

)

WHERE ArrivalDate >= @StartYear1 AND N <= 12

GROUP BY N;

Have I got that right? Unfortunately this is taking twice as long to execute as the original report query :S

Allister Reid SSCrazy Points: 2666 More actions · Answer 4

Also,

here's the script to populate the temp test table with arrival dates sans time component.

INSERT INTO #Stay (GuestID, ArrivalDate, TotalRevenue)

SELECT TOP 2000000

NEWID(),

DATEADD(DD, DATEDIFF(DD, 0, CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME)), 0),

CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY)

FROM Master.dbo.SysColumns t1

CROSS JOIN Master.dbo.SysColumns t2

Allister Reid SSCrazy Points: 2666 More actions · Answer 5

Hi,

I got this working a lot faster (registering zero seconds :w00t:) by getting rid of the dateadd datediff out of the join statement.

The report query now looks like:

SELECT

DATEADD(MM, N - 1, @StartYear1) Year1,

SUM(CASE WHEN MonthStart < @StartYear2 THEN TotalRevenue ELSE 0 END) Year1Revenue,

DATEADD(MM, N -1, @StartYear2) Year2,

SUM(CASE WHEN MonthStart >= @StartYear2 THEN TotalRevenue ELSE 0 END) Year2Revenue

FROM Tally

LEFT JOIN

(

SELECT

DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0) MonthStart,

SUM(TotalRevenue) TotalRevenue

FROM #Stay

WHERE ArrivalDate >= @StartYear1

GROUP BY DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0)

) Aggregation

ON

DATEADD(MM, N - 1, @StartYear1) = MonthStart

OR DATEADD(MM, N - 1, @StartYear2) = MonthStart

WHERE N <= 12

GROUP BY N;

Many thanks for your input 🙂

Allister