June 9, 2010 at 7:28 am
Hi,
I have been tasked with writing a query that generates a result set for use in a web application.
The result set is a side by side comparison (month on month) of the last two years of transactions for a hotel group.
There are close to 2 million stay (transaction) records in the live data, and the query I have written takes 10 seconds to run.
I had hoped to do better than this so started creating the ddls and a query to populate a table to post here, and when I ran the report query execution completed in 25% time of the live data. Fragmentation on the live data index is minimal at 0.01%. Please find execution plan of query on live data attached.
Ok, here's the ddl and script for populating the test table:
NOTE: THIS SCRIPT CREATES A TEMPORARY TABLE WITH 2 MILLION ROWS
USE tempdb;
IF OBJECT_ID('tempdb..#Stay') IS NOT NULL
DROP TABLE #Stay;
CREATE TABLE #Stay(
[StayID] [uniqueidentifier] NOT NULL,
[GuestID] [uniqueidentifier] NOT NULL,
[ArrivalDate] [datetime] NOT NULL,
[TotalRevenue] [money] NOT NULL
CONSTRAINT [PK_Stay] PRIMARY KEY NONCLUSTERED
(
[StayID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
ALTER TABLE #Stay ADD CONSTRAINT [DF_Stay_StayID] DEFAULT (newsequentialid()) FOR [StayID];
CREATE CLUSTERED INDEX [IX_Stay_ArrivalDateGuestID] ON #Stay
(
[ArrivalDate] ASC,
[GuestID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY];
--Populate with one million rows of random data, thanks to Jeff for this (http://www.sqlservercentral.com/articles/Crosstab/65048/)
INSERT INTO #Stay (GuestID, ArrivalDate, TotalRevenue)
SELECT TOP 2000000
NEWID(),
CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY)
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
And here is the report query that I want to improve performance on:
USE tempdb
DECLARE @StartYear1 DATETIME;
DECLARE @StartYear2 DATETIME;
SET @StartYear2 = DATEADD(YY, -1, DATEADD(MM, DATEDIFF(MM, 0, GETDATE()), 0));
SET @StartYear1 = DATEADD(YY, -1, @StartYear2);
SELECT
DATEADD(MM, N - 1, @StartYear1) Year1,
SUM(CASE WHEN ArrivalDate >= @StartYear1 AND ArrivalDate < @StartYear2 THEN TotalRevenue ELSE 0 END) Year1Revenue,
DATEADD(MM, N -1, @StartYear2) Year2,
SUM(CASE WHEN ArrivalDate >= @StartYear2 THEN TotalRevenue ELSE 0 END) Year2Revenue
FROM Master.dbo.Tally
LEFT JOIN #Stay
ON
DATEADD(MM, N - 1, @StartYear1) = DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0)
OR DATEADD(MM, N - 1, @StartYear2) = DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0)
WHERE ArrivalDate >= @StartYear1 AND N <= 12
GROUP BY N;
Hopefully someone can help?
Thanks in advance 🙂
June 9, 2010 at 8:01 am
The first thing I see that I'm confused by is that your query is actually doing a clustered index seek... with the DateAdd and DateDiff functions being used around the ArrivalDate, I did not expect that.
Question: does the ArrivalDate contain times, or is it just a date?
If yes, you might want to recode the join condition to be something like:
(
ArrivalDate >= DATEADD(MM, N - 1, @StartYear1) AND
ArrivalDate < DateAdd(day, 1, DATEADD(MM, N - 1, @StartYear1))
)
OR
(
ArrivalDate >= DATEADD(MM, N - 1, @StartYear2) AND
ArrivalDate < DateAdd(day, 1, DATEADD(MM, N - 1, @StartYear2))
)
If not, take out the DateAdd/DateDiff from around that column.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
June 9, 2010 at 8:19 am
Thanks for the reply Wayne,
Arrival date has no time associated with it on live data.
However, not sure what you mean with regards the join?
Checking the report query with my join condition, with the query below, gives me the correct answer on both live and dummy.
SELECT SUM(TotalRevenue)
FROM #Stay
WHERE
ArrivalDate>='2008-06-01 00:00:00.000'
AND ArrivalDate<'2008-07-01 00:00:00.000';
June 9, 2010 at 8:31 am
Ah, i see where the confusion was arising, you added a day, instead of a month in the less than part of each join condition:
So report query with your join condition would be:
SELECT
DATEADD(MM, N - 1, @StartYear1) Year1,
SUM(CASE WHEN ArrivalDate >= @StartYear1 AND ArrivalDate < @StartYear2 THEN TotalRevenue ELSE 0 END) Year1Revenue,
DATEADD(MM, N -1, @StartYear2) Year2,
SUM(CASE WHEN ArrivalDate >= @StartYear2 THEN TotalRevenue ELSE 0 END) Year2Revenue
FROM Master.dbo.Tally
LEFT JOIN #Stay
ON
(
ArrivalDate >= DATEADD(MM, N - 1, @StartYear1) AND
ArrivalDate < DateAdd(mm, 1, DATEADD(MM, N - 1, @StartYear1))
)
OR
(
ArrivalDate >= DATEADD(MM, N - 1, @StartYear2) AND
ArrivalDate < DateAdd(mm, 1, DATEADD(MM, N - 1, @StartYear2))
)
WHERE ArrivalDate >= @StartYear1 AND N <= 12
GROUP BY N;
Have I got that right? Unfortunately this is taking twice as long to execute as the original report query :S
June 9, 2010 at 8:32 am
Also,
here's the script to populate the temp test table with arrival dates sans time component.
INSERT INTO #Stay (GuestID, ArrivalDate, TotalRevenue)
SELECT TOP 2000000
NEWID(),
DATEADD(DD, DATEDIFF(DD, 0, CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME)), 0),
CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY)
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
June 10, 2010 at 3:15 am
Hi,
I got this working a lot faster (registering zero seconds :w00t:) by getting rid of the dateadd datediff out of the join statement.
The report query now looks like:
SELECT
DATEADD(MM, N - 1, @StartYear1) Year1,
SUM(CASE WHEN MonthStart < @StartYear2 THEN TotalRevenue ELSE 0 END) Year1Revenue,
DATEADD(MM, N -1, @StartYear2) Year2,
SUM(CASE WHEN MonthStart >= @StartYear2 THEN TotalRevenue ELSE 0 END) Year2Revenue
FROM Tally
LEFT JOIN
(
SELECT
DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0) MonthStart,
SUM(TotalRevenue) TotalRevenue
FROM #Stay
WHERE ArrivalDate >= @StartYear1
GROUP BY DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0)
) Aggregation
ON
DATEADD(MM, N - 1, @StartYear1) = MonthStart
OR DATEADD(MM, N - 1, @StartYear2) = MonthStart
WHERE N <= 12
GROUP BY N;
Many thanks for your input 🙂
Allister
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply