Revenue per month report — cross tab with tally

  • Hi,

    I have been tasked with writing a query that generates a result set for use in a web application.

    The result set is a side by side comparison (month on month) of the last two years of transactions for a hotel group.

    There are close to 2 million stay (transaction) records in the live data, and the query I have written takes 10 seconds to run.

    I had hoped to do better than this so started creating the ddls and a query to populate a table to post here, and when I ran the report query execution completed in 25% time of the live data. Fragmentation on the live data index is minimal at 0.01%. Please find execution plan of query on live data attached.

    Ok, here's the ddl and script for populating the test table:

    NOTE: THIS SCRIPT CREATES A TEMPORARY TABLE WITH 2 MILLION ROWS

    USE tempdb;

    IF OBJECT_ID('tempdb..#Stay') IS NOT NULL

    DROP TABLE #Stay;

    CREATE TABLE #Stay(

    [StayID] [uniqueidentifier] NOT NULL,

    [GuestID] [uniqueidentifier] NOT NULL,

    [ArrivalDate] [datetime] NOT NULL,

    [TotalRevenue] [money] NOT NULL

    CONSTRAINT [PK_Stay] PRIMARY KEY NONCLUSTERED

    (

    [StayID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY];

    ALTER TABLE #Stay ADD CONSTRAINT [DF_Stay_StayID] DEFAULT (newsequentialid()) FOR [StayID];

    CREATE CLUSTERED INDEX [IX_Stay_ArrivalDateGuestID] ON #Stay

    (

    [ArrivalDate] ASC,

    [GuestID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY];

    --Populate with one million rows of random data, thanks to Jeff for this (http://www.sqlservercentral.com/articles/Crosstab/65048/)

    INSERT INTO #Stay (GuestID, ArrivalDate, TotalRevenue)

    SELECT TOP 2000000

    NEWID(),

    CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY)

    FROM Master.dbo.SysColumns t1

    CROSS JOIN Master.dbo.SysColumns t2

    And here is the report query that I want to improve performance on:

    USE tempdb

    DECLARE @StartYear1 DATETIME;

    DECLARE @StartYear2 DATETIME;

    SET @StartYear2 = DATEADD(YY, -1, DATEADD(MM, DATEDIFF(MM, 0, GETDATE()), 0));

    SET @StartYear1 = DATEADD(YY, -1, @StartYear2);

    SELECT

    DATEADD(MM, N - 1, @StartYear1) Year1,

    SUM(CASE WHEN ArrivalDate >= @StartYear1 AND ArrivalDate < @StartYear2 THEN TotalRevenue ELSE 0 END) Year1Revenue,

    DATEADD(MM, N -1, @StartYear2) Year2,

    SUM(CASE WHEN ArrivalDate >= @StartYear2 THEN TotalRevenue ELSE 0 END) Year2Revenue

    FROM Master.dbo.Tally

    LEFT JOIN #Stay

    ON

    DATEADD(MM, N - 1, @StartYear1) = DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0)

    OR DATEADD(MM, N - 1, @StartYear2) = DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0)

    WHERE ArrivalDate >= @StartYear1 AND N <= 12

    GROUP BY N;

    Hopefully someone can help?

    Thanks in advance 🙂

  • The first thing I see that I'm confused by is that your query is actually doing a clustered index seek... with the DateAdd and DateDiff functions being used around the ArrivalDate, I did not expect that.

    Question: does the ArrivalDate contain times, or is it just a date?

    If yes, you might want to recode the join condition to be something like:

    (

    ArrivalDate >= DATEADD(MM, N - 1, @StartYear1) AND

    ArrivalDate < DateAdd(day, 1, DATEADD(MM, N - 1, @StartYear1))

    )

    OR

    (

    ArrivalDate >= DATEADD(MM, N - 1, @StartYear2) AND

    ArrivalDate < DateAdd(day, 1, DATEADD(MM, N - 1, @StartYear2))

    )

    If not, take out the DateAdd/DateDiff from around that column.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Thanks for the reply Wayne,

    Arrival date has no time associated with it on live data.

    However, not sure what you mean with regards the join?

    Checking the report query with my join condition, with the query below, gives me the correct answer on both live and dummy.

    SELECT SUM(TotalRevenue)

    FROM #Stay

    WHERE

    ArrivalDate>='2008-06-01 00:00:00.000'

    AND ArrivalDate<'2008-07-01 00:00:00.000';

  • Ah, i see where the confusion was arising, you added a day, instead of a month in the less than part of each join condition:

    So report query with your join condition would be:

    SELECT

    DATEADD(MM, N - 1, @StartYear1) Year1,

    SUM(CASE WHEN ArrivalDate >= @StartYear1 AND ArrivalDate < @StartYear2 THEN TotalRevenue ELSE 0 END) Year1Revenue,

    DATEADD(MM, N -1, @StartYear2) Year2,

    SUM(CASE WHEN ArrivalDate >= @StartYear2 THEN TotalRevenue ELSE 0 END) Year2Revenue

    FROM Master.dbo.Tally

    LEFT JOIN #Stay

    ON

    (

    ArrivalDate >= DATEADD(MM, N - 1, @StartYear1) AND

    ArrivalDate < DateAdd(mm, 1, DATEADD(MM, N - 1, @StartYear1))

    )

    OR

    (

    ArrivalDate >= DATEADD(MM, N - 1, @StartYear2) AND

    ArrivalDate < DateAdd(mm, 1, DATEADD(MM, N - 1, @StartYear2))

    )

    WHERE ArrivalDate >= @StartYear1 AND N <= 12

    GROUP BY N;

    Have I got that right? Unfortunately this is taking twice as long to execute as the original report query :S

  • Also,

    here's the script to populate the temp test table with arrival dates sans time component.

    INSERT INTO #Stay (GuestID, ArrivalDate, TotalRevenue)

    SELECT TOP 2000000

    NEWID(),

    DATEADD(DD, DATEDIFF(DD, 0, CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME)), 0),

    CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY)

    FROM Master.dbo.SysColumns t1

    CROSS JOIN Master.dbo.SysColumns t2

  • Hi,

    I got this working a lot faster (registering zero seconds :w00t:) by getting rid of the dateadd datediff out of the join statement.

    The report query now looks like:

    SELECT

    DATEADD(MM, N - 1, @StartYear1) Year1,

    SUM(CASE WHEN MonthStart < @StartYear2 THEN TotalRevenue ELSE 0 END) Year1Revenue,

    DATEADD(MM, N -1, @StartYear2) Year2,

    SUM(CASE WHEN MonthStart >= @StartYear2 THEN TotalRevenue ELSE 0 END) Year2Revenue

    FROM Tally

    LEFT JOIN

    (

    SELECT

    DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0) MonthStart,

    SUM(TotalRevenue) TotalRevenue

    FROM #Stay

    WHERE ArrivalDate >= @StartYear1

    GROUP BY DATEADD(MM, DATEDIFF(MM, 0, ArrivalDate), 0)

    ) Aggregation

    ON

    DATEADD(MM, N - 1, @StartYear1) = MonthStart

    OR DATEADD(MM, N - 1, @StartYear2) = MonthStart

    WHERE N <= 12

    GROUP BY N;

    Many thanks for your input 🙂

    Allister

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply